sasproc-sqlimport-from-excel

How to remove columns specified in an Excel file from a SAS table?


I have table in SAS like below:

TABLE 1

COL3  | COL2 | ...  | COLn
------|------|------|----
111   | ABC  | ...  | 12
222   | AAA  | ...  | 10.5
333   | app  | ...  | 5
...   | ...  | ...  | ...

And I have table in Excel file like below:

TABLE 2

GROUP| NAME  | DEFINITION
-----|-------|-------------
ABC  | COL1  | xxxxxxx
ABC  | COL2  | xxxxxxxxxx
BBB  | COL15 | xxxxxxxxx
...  | ...   | ...

And I need to remove from TABLE 1 (SAS table) variables which are listed in "NAME" column in TABLE 2 (Excel file).

So as a result i need something like below (using only example above but i have many more columns of course).

COL3  |  ... | COLn
------|------|----
111   | ...  | 12
222   | ...  | 10.5
333   | ...  | 5
...   | ...  | ...

How can I do that in SAS Enterprise Guide ?


Solution

  • Create sample dataset and .xlsx file

    data have;
    input col1-col5 ;
    cards;
    1 2 3 4 5
    6 7 8 9 10
    ;
    run;
    

    enter image description here

    Import the file

    proc import datafile='/home/kermit/have.xlsx' 
            dbms=XLSX replace 
            out=have_xlsx(keep=NAME);
    run;
    

    Fill the cols macro variable with the list of distinct columns to drop from the name column

    proc sql noprint;
        select distinct name into :cols separated by ' '
        from have_xlsx;
    quit;
    

    Drop the selected columns

    data want;
        set have(drop=&cols.);
    run;
    

    As a result, only col3 is kept

    col3
     3
     8
    

    After your comment

    There are two ways

    1. Play with the DKRICOND option before the data step. It will suppress the error message that would normally be generated by trying to drop a variable that does not exist and will let your data step execute (not recommended)
    option DKRICOND=NOWARN;
    data want;
        set have(drop=&cols.);
    run;
    option DKRICOND=ERROR; 
    
    1. Adapt the current SQL query to only retrieve columns that exist in both tables
    proc sql noprint;
        select distinct upcase(name) into :cols separated by ' '
        from have_xlsx
        where upcase(name) in 
        (select distinct upcase(name) from sashelp.vcolumn
            where upcase(memname) = 'HAVE');
    quit;