Home Forums Main Forums SAS Forum How to detect and drop dataset variables that have only missing values??

  • How to detect and drop dataset variables that have only missing values??

     Datura updated 3 years, 9 months ago 1 Member · 1 Post
  • Datura

    Member
    February 1, 2021 at 12:41 pm

    Question: I have a big data set with thousands of variables, some variables may have nothing but missing values. How can I detect these variables and drop them to make processing efficient?

    Method 1: We can use custom formats along with Proc Freq to do it.

    data class;
    if 0 then set sashelp.class;
    do i=1 to 10;
    output;
    end;
    stop;
    run;

    proc format;
    value allmiss
    ._-.z=.
    other=1;
    value $allmiss
    ' '=' '
    other='1';
    run;

    ods select nlevels;
    ods output nlevels=nlevels(keep=TableVar NNonMissLevels where=( NNonMissLevels=0));

    proc freq levels;
    format _character_ $allmiss. _numeric_ allmiss.;
    run;
    ods output close;

    Method 2: Use SAS Macro

    http://support.sas.com/resources/papers/proceedings10/048-2010.pdf

    Introduction:
    Sometimes all the values of some variables in a SAS® dataset are missing or null, and we would like to drop these variables to save disk space. Here we introduce a macro %DROPMISS that can automatically identify and drop SAS variables that have ONLY missing or null values. You have the option of not dropping the variables you do not want to drop by using a parameter in the %DROPMISS macro.

    %DROPMISS MACRO
    The code for %DROPMISS macro is given in the Appendix. Here we will show how to use the %DROPMISS macro using the following dataset DSIN.

     data DSIN; 
    input a s d z $ c $ x $ y ;
    datalines;
    1 . 3 ab pq . .
    . . . . xy . .
    3 . 3 ln . . .
    ;
    run;

    In the dataset DSIN, all the values of the numeric variable S and Y are null, and all the values of the character variable X are missing. If we want to drop all the variables that are always missing or null, we will use the %DROPMISS macro as follows.

    %DROPMISS(DSIN, DSOUT)

    When you run this macro, output dataset DSOUT will be produced. The dataset DSOUT is the same as DSIN except that it will not have the variables s, y, and x. These variables are always missing or always null in the dataset DSIN.
    But, if we want to keep the variables y and x even though they are always missing or null, then you use the %DROPMISS as follows.
    %DROPMISS(DSIN,DSOUT, nodrop= Y X)

    Only the variable S will be dropped.
    If you do not want to drop any of the numeric variables, you can use %DROPMISS (DSIN,DSOUT, nodrop= _NUMERIC_ ). If you do not want to drop any of the character variables, you can use %DROPMISS (DSIN,DSOUT, nodrop= _CHARACTER_ ).

    CONCLUSION
    %DROPMISS macro is very efficient, and could save you a huge amount of space if some of the variables in a SAS dataset are always missing or null and they are dropped.

Log in to reply.

Original Post
0 of 0 posts June 2018
Now