
If you want to find a topic in the book quickly, you may use the index at the back. It is not too hard to understand how a SAS index can help you to directly access the observations that you need in a particular SAS data set.
Here is an example how you can use the index with double-set in SAS.
1. Code
Code in gist
/*----------------------------------------------------------
Index and Double-set
------------------------------------------------------------
This example is to demonstrate the use of proc
datasets (index) and double-set.
------------------------------------------------------------
The first data set TRANSACTION is a lookup
table.
The second data set MASTER is to get multiple
observations per lookup observation.
------------------------------------------------------------
This program is modifed from Kevin McKinney
----------------------------------------------------------*/
* TRANSACTION as lookup table, with multiple
records;
data TRANSACTION; input ein year quarter dataitem $2.; datalines; 170000000000 1990 1 a 170000000000 1990 1 b 170000000001 1991 1 c 170000000001 1992 4 d 170000000000 1990 1 e 170000000071 1992 4 f 170000000449 1992 3 g 170000000528 1990 1 h 170000000798 1991 2 i 170000001003 1992 1 j ; run; proc print; run; * MASTER has no unique Key; data MASTER; input ein year quarter who $6.; datalines; 170000000000 1990 1 one 170000000000 1991 1 two 170000000449 1992 3 three 170000000449 1992 3 four 170000001003 1992 1 five 170000001003 1992 1 six ; run; proc print; run; * Create index for MASTER; proc datasets lib = work; modify MASTER; index create ein; run;
data out;
set TRANSACTION;
continue=0;
continue1=0;
do while ( continue=0 );
set MASTER key=ein;
if continue1=1 then continue=1;
if _iorc_ = 0 then do;
output;
end;
else do;
_error_=0;
ein= 00000000000;
continue1=1;
end;
end;
run;
proc print;
run;
2. Result


3. Discussion
What is the purpose of this example?
- To show you a SAS Tip in table lookup operation with index and double-set
- Table lookup operations are often the most time consuming part of many SAS programs
- DATA step MERGE statement
- SQL inner join
- SQL subquery
- FORMAT procedure and PUT function
- SET statements and KEY = option
- CALL EXECUTE routine
- SQL INTO clause
- The primary file (MASTER) is the file for which you want to obtain auxiliary information.
- The lookup file (TRANSACTION) is an auxiliary file that is maintained separately from the primary file and that is referenced for one or more of the observations of the primary file.
- The key variable (ein)is the variable or variables whose values are the common elements between the primary file and the lookup file. Typically, key values are unique in the lookup file but not necessarily unique in the primary file.
- The lookup result (out) is the auxiliary information obtained using the key variable or variables as reference into the lookup file.
- proc datasets (options: modify, index)
- double-set in the data step
- If you want to extract a small subset (1% to 15% of the total number of observations) from a large SAS data set, it's good to use the index. Or if you predict that you would access a SAS data set very often via an index, then that data set is a good index candidate. In doing so, the amount of computer resources (CPU time, I/O’s, Elapsed time, etc.) expended should be less than if SAS read the entire data set sequentially.
- proc datasets is one of the three methods to create index (proc sql, data step and proc datasets).
- The dataset in the first set statement(TRANSACTION, lookup) is not a unique list of key values. And the dataset in the second set statement (MASTER) does NOT have a unique set of key values. The KEY (ein) is repeated for observation one, two, and five. So this requires resetting the pointer on each iteration of the data step, and two additional lookups in the MASTER dataset. Once to know when the end of all the records for that EIN is reached. Once more to reset the pointer. The auxiliary variables continue and continue1 will create a supplementary condition within the do while loop. These two auxiliary variables together with double-set would yield the result we want.
From Bancova.com



