Useful LAG and DIF Functions in SAS

Useful LAG and DIF Functions in SAS

Datura updated 4 years, 2 months ago 1 Member · 2 Posts
SAS Forum
Datura

Member
February 1, 2021 at 12:48 pm
SAS DATA step provides two functions, LAG and DIF, for accessing previous values of a variable or expression. These functions are useful for computing lags and differences of series.

For example, the following statements add the variables CPILAG and CPIDIF to the USCPI data set. The variable CPILAG contains lagged values of the CPI series. The variable CPIDIF contains the changes of the CPI series from the previous period; that is, CPIDIF is CPI minus CPILAG. The new data set is shown in part in Figure 3.16.
```
    data uscpi;
       set uscpi;
       cpilag = lag( cpi );
       cpidif = dif( cpi );
    run;
    proc print data=uscpi;     run;
```
Note: The first row will have missing values for CPILAG and CPIDIF.

Understanding the DATA Step LAG and DIF Functions

When used in this simple way, LAG and DIF act as lag and difference functions. However, it is important to keep in mind that, despite their names, the LAG and DIF functions available in the DATA step are not true lag and difference functions.

Rather, LAG and DIF are queuing functions that remember and return argument values from previous calls. The LAG function remembers the value you pass to it and returns as its result the value you passed to it on the previous call. The DIF function works the same way but returns the difference between the current argument and the remembered value. (LAG and DIF return a missing value the first time the function is called.)

A true lag function does not return the value of the argument for the “previous call,” as do the DATA step LAG and DIF functions. Instead, a true lag function returns the value of its argument for the “previous observation,” regardless of the sequence of previous calls to the function. Thus, for a true lag function to be possible, it must be clear what the “previous observation” is.

2) Alternative Solutions
You can also calculate lags and differences in the DATA step without using LAG and DIF functions. For example, the following statements add the variables CPILAG and CPIDIF to the USCPI data set:
```
    data uscpi;
       set uscpi;
       retain cpilag;     
       cpidif = cpi - cpilag;       
       output;
       cpilag = cpi;
    run;
```
The RETAIN statement prevents the DATA step from reinitializing CPILAG to a missing value at the start of each iteration and thus allows CPILAG to retain the value of CPI assigned to it in the last statement.

The OUTPUT statement causes the output observation to contain values of the variables before CPILAG is reassigned the current value of CPI in the last statement.
*** This is the approach that must be used if you want to build a variable that is a function of its previous lags.

(3) LAGn and DIFn

These are the enhanced LAG and DIF functions.

Syntax: LAGn(X)/ DIFn (X)

Arguments
n: specifies the number of lags.
X: specifies a numeric constant, variable, or expression.
Details
The DIF functions, DIF1, DIF2, …, DIF100, return the first differences between the argument and its nth lag. DIF1 can also be written as DIF. DIFn is defined as DIFn(X)= X – LAGn(X).
- This discussion was modified 4 years, 2 months ago by Datura.
- This discussion was modified 4 years, 2 months ago by Datura.
- This discussion was modified 4 years, 2 months ago by Datura.
- This discussion was modified 4 years, 2 months ago by Datura.
Datura

Member
February 1, 2021 at 1:04 pm
Look Ahead — Opposite of LAG function
We can use SET statements to construct the opposite of the LAG function, namely a “look ahead”.
```
 Data A;
 Input name $ sales;
 cards;
 Alice  100
 Jenny  265
 Lynne  785
 Zane   963
 Mary   612
 ;
 run;
```
```
 Data CCC;
 Set A (firstobs=2  rename=(sales=Next_sales) )  END=EOF  NOBS=I;
 Set A;
 DIF=Next_sales - sales;
 Output;
 If EOF then do;
 Set A point=I;
 Next_sales=.;
 DIF=Next_sales - sales;
 Output;
 End;
 Run;
```
Watch out at the end: the first SET A statement will signal EOF=1 first, so that the last observation in the 2nd SET A will not be read in, unless we use the IF EOF statement to execute the 2nd SET A twice. The explicit output statements are necessary.

Useful LAG and DIF Functions in SAS

Datura

Datura