Home › Forums › Main Forums › SAS Forum › SAS: what is the function of multiple SET statements in a DATA step?
-
SAS: what is the function of multiple SET statements in a DATA step?
-
My previous student Grace asked me this question before. It is a good question, and I am happy to answer and share it with others.
She asked: “Given the below SAS code
data con1;
input custom_id $ product $ 12.;
cards;
28901 pentium IV
36815 pentium III
;
run;
data con2;
input custom_id $ product $ 12.;
cards;
18601 pentium IV
24683 pentium III
851921 pentium IV
61831 pentium IV
;
run;
data con3;
set con1;
set con2;
run;The result is:
custom_id product
18601 pentium IV
24683 pentium IIIHowever, if we change code as follows:
data con3;
set con2;
set con1;
run;The result will be:
custom_id product
28901 pentium IV
36815 pentium IIIHow come? What’s the mechanism for con1 overwrite con2? And, what is the function of multiple SET statements in a DATA step?
-
In data step, if we use multiple SET statements rather than one SET statement, the outcome is to overwrite rather than appending. The observations in the later SET will overwrite the observations in the previous one.
Another key point is: when will it stop and how many observations can produce? Given below example, if A and B have exactly same variables.
data C;
set A; * 5 records;
set B; * 10 records;
run;
data D;
set B; * 10 records;
set A; * 5 records;
run;Both data C and D have only 5 records, but the records are different, why?
As you remember, each data set has a End of File indicator (which can be monitored by the END=EOF option), the data step execution is stopped no matter which data set reaches the end of the file first. In the above case, data A has only 5 obs, therefore it always reach the end first, and determines the final number of observations in the output data set: 5 observations! In summary, the final number of output observations is always determined by the smallest number of dataset observations, if you have multiple SET statements..
However, although the number of observations are same in data C and D, but the records are different, because the later one always overwrite the previous one: the ORDER does matter!
Log in to reply.