sasdatastep

How do I add a variable to a dataset within a SET statement in a data step?


I have a SAS data step that uses a %do loop to build a set statement that combines multiple files. The files have names of the form datafile2020q1 or datafile2021q3, and they're each stored in libraries with names of the form DataLibrary2020 or DataLibrary2021.

%macro processclaimsdata;
data rawclaimsdata (keep=&claimvariables.);
set
    %do yr=2019 %to 2021;
        %do qrtr=1 %to 4;
            DataLibrary&yr..datafile&yr.q&qrtr. (keep=&claimvariables.)
        %end;
    %end;
;
run;
%mend;

My goal is to add a variable to the dataset that has YYYYqQ in it, e.g., 2020q1 or 2021q3, for each dataset, so I can keep track of which file it came from. Is this possible by modifying my code above, or do I need to rework it to use proc append and/or proc sql?


Solution

  • Use the indsname= option to keep track of the dataset that a row came from. Although it won't be in the format you specify, this will tell you the exact dataset name that a row came from.

    The code below will create a new temporary variable in the dataset called dsn that stores the name of the dataset a row came from. You can add it to the dataset by assigning it to a permanent variable.

    data want;
        set have1
            have2
            have3
        indsname=dsn;
    
        dsname = dsn;
    run;
    

    You can then use substr() or other string functions to snag the name of the quarter from each dataset name. For example:

    qtr = substr(dsn, -6);