sas

Create sub-datasets based on pattern matching on the variable names


I have the following dataset:

data have;
  input ID :$20. M1r2c1: $20. M2r2c1: $20. M11rxc1: $20. M11rxc2: $20.; 
cards;
00x1 Frequent     Rank 1    10-500    Frequent
0sx2 Infreqent    Rank 5    10-500    Infrequent
0tb3 Often                  10-30     Frequent
er04 Not Frequent Rank 3              Frequent
;

Is there a way to create a list of datasets (subsets of the dataset "have") based on the first pattern i.e., M*? In other words:

The dataset "have" contains 300 columns but M* is from 1 to 27.


Solution

  • It would be much easier if you renamed the variables so there was a delimiter after the numeric suffix.

    Obs    _NAME_     newname
    
     1     ID         ID
     2     M1r2c1     M1_r2c1
     3     M2r2c1     M2_r2c1
     4     M11rxc1    M11_rxc1
     5     M11rxc2    M11_rxc2
    

    Then you could use the : wildcard in your keep statement.

    keep ID M1_: ;
    

    Here is one way to convert strings like M1r2c1 into M1_r2c1.

    prxchange('s/^(M\d+)/$1_/',1,name)
    

    Let's first fix your data step so it can actually run.

    data have;
      length ID M1r2c1 M2r2c1 M11rxc1 M11rxc2 $20; 
      input ID 1-4 M1r2c1 6-17 M2r2c1 19-25 M11rxc1 29-35 M11rxc2 39-50; 
    cards;
    00x1 Frequent     Rank 1    10-500    Frequent
    0sx2 Infreqent    Rank 5    10-500    Infrequent
    0tb3 Often                  10-30     Frequent
    er04 Not Frequent Rank 3              Frequent
    ;
    

    Now lets use an SQL query to make a macro variable with oldname=newname pairs that we can use to generate a RENAME statement or the RENAME= dataset option.

    proc sql noprint;
    select catx('='
            ,nliteral(name)
            ,nliteral(prxchange('s/^(M\d+)/$1_/',1,name))
            )
      into :renames separated by ' '
      from dictionary.columns
      where libname='WORK'
        and memname='HAVE'
    ;
    quit;
    

    Now we can run a data step to make all three of your output datasets.

    data want1(keep=id m1_: ) want2(keep=id m2_:) want11(keep=id m11_:) ;
      set have(rename=(&renames));
    run;
    

    Results:

    enter image description hereenter image description hereenter image description here

    If you don't want to bother trying to learn how to use regular expressions you could use some other simpler SAS functions. Like VERIFY() and SUBSTR() to find the prefix for your variable names.

    proc sql ;
    create table groups as
      select name
           , substrn(name,1,verify(upcase(name),'M0123456789')-1) as prefix length=32
           , input(substr(calculated prefix,2),32.) as group
      from dictionary.columns
      where libname='WORK'
        and memname='HAVE'
        and upcase(name) like 'M%'
      order by group,name
    ;
    quit;
    

    Result

    Obs     name      prefix    group
    
     1     M1r2c1      M1          1
     2     M2r2c1      M2          2
     3     M11rxc1     M11        11
     4     M11rxc2     M11        11