Home Forums Main Forums SAS Forum Useful LAG and DIF Functions in SAS

  • Useful LAG and DIF Functions in SAS

     Datura updated 3 years, 9 months ago 1 Member · 2 Posts
  • Datura

    Member
    February 1, 2021 at 12:48 pm

    SAS DATA step provides two functions, LAG and DIF, for accessing previous values of a variable or expression. These functions are useful for computing lags and differences of series.

    For example, the following statements add the variables CPILAG and CPIDIF to the USCPI data set. The variable CPILAG contains lagged values of the CPI series. The variable CPIDIF contains the changes of the CPI series from the previous period; that is, CPIDIF is CPI minus CPILAG. The new data set is shown in part in Figure 3.16.

        data uscpi;
    set uscpi;
    cpilag = lag( cpi );
    cpidif = dif( cpi );
    run;
    proc print data=uscpi; run;

    Note: The first row will have missing values for CPILAG and CPIDIF.

    Understanding the DATA Step LAG and DIF Functions

    When used in this simple way, LAG and DIF act as lag and difference functions. However, it is important to keep in mind that, despite their names, the LAG and DIF functions available in the DATA step are not true lag and difference functions.

    Rather, LAG and DIF are queuing functions that remember and return argument values from previous calls. The LAG function remembers the value you pass to it and returns as its result the value you passed to it on the previous call. The DIF function works the same way but returns the difference between the current argument and the remembered value. (LAG and DIF return a missing value the first time the function is called.)

    A true lag function does not return the value of the argument for the “previous call,” as do the DATA step LAG and DIF functions. Instead, a true lag function returns the value of its argument for the “previous observation,” regardless of the sequence of previous calls to the function. Thus, for a true lag function to be possible, it must be clear what the “previous observation” is.

    2) Alternative Solutions
    You can also calculate lags and differences in the DATA step without using LAG and DIF functions. For example, the following statements add the variables CPILAG and CPIDIF to the USCPI data set:

        data uscpi;
    set uscpi;
    retain cpilag;
    cpidif = cpi - cpilag;
    output;
    cpilag = cpi;
    run;

    The RETAIN statement prevents the DATA step from reinitializing CPILAG to a missing value at the start of each iteration and thus allows CPILAG to retain the value of CPI assigned to it in the last statement.

    The OUTPUT statement causes the output observation to contain values of the variables before CPILAG is reassigned the current value of CPI in the last statement.
    *** This is the approach that must be used if you want to build a variable that is a function of its previous lags.

    (3) LAGn and DIFn

    These are the enhanced LAG and DIF functions.

    Syntax: LAGn(X)/ DIFn (X)

    Arguments
    n: specifies the number of lags.
    X: specifies a numeric constant, variable, or expression.
    Details
    The DIF functions, DIF1, DIF2, …, DIF100, return the first differences between the argument and its nth lag. DIF1 can also be written as DIF. DIFn is defined as DIFn(X)= X – LAGn(X).

    • This discussion was modified 3 years, 9 months ago by  Datura.
    • This discussion was modified 3 years, 9 months ago by  Datura.
    • This discussion was modified 3 years, 9 months ago by  Datura.
    • This discussion was modified 3 years, 9 months ago by  Datura.
  • Datura

    Member
    February 1, 2021 at 1:04 pm

    Look Ahead — Opposite of LAG function
    We can use SET statements to construct the opposite of the LAG function, namely a “look ahead”.

     Data A;
    Input name $ sales;
    cards;
    Alice 100
    Jenny 265
    Lynne 785
    Zane 963
    Mary 612
    ;
    run;

     Data CCC;
    Set A (firstobs=2 rename=(sales=Next_sales) ) END=EOF NOBS=I;
    Set A;
    DIF=Next_sales - sales;
    Output;
    If EOF then do;
    Set A point=I;
    Next_sales=.;
    DIF=Next_sales - sales;
    Output;
    End;
    Run;

    Watch out at the end: the first SET A statement will signal EOF=1 first, so that the last observation in the 2nd SET A will not be read in, unless we use the IF EOF statement to execute the 2nd SET A twice. The explicit output statements are necessary.

Log in to reply.

Original Post
0 of 0 posts June 2018
Now