I have the following dataset:
data have;
input ID :$20. M1r2c1: $20. M2r2c1: $20. M11rxc1: $20. M11rxc2: $20.;
cards;
00x1 Frequent Rank 1 10-500 Frequent
0sx2 Infreqent Rank 5 10-500 Infrequent
0tb3 Often 10-30 Frequent
er04 Not Frequent Rank 3 Frequent
;
Is there a way to create a list of datasets (subsets of the dataset "have") based on the first pattern i.e., M*? In other words:
The dataset "have" contains 300 columns but M* is from 1 to 27.
It would be much easier if you renamed the variables so there was a delimiter after the numeric suffix.
Obs _NAME_ newname
1 ID ID
2 M1r2c1 M1_r2c1
3 M2r2c1 M2_r2c1
4 M11rxc1 M11_rxc1
5 M11rxc2 M11_rxc2
Then you could use the : wildcard in your keep statement.
keep ID M1_: ;
Here is one way to convert strings like M1r2c1 into M1_r2c1.
prxchange('s/^(M\d+)/$1_/',1,name)
Let's first fix your data step so it can actually run.
data have;
length ID M1r2c1 M2r2c1 M11rxc1 M11rxc2 $20;
input ID 1-4 M1r2c1 6-17 M2r2c1 19-25 M11rxc1 29-35 M11rxc2 39-50;
cards;
00x1 Frequent Rank 1 10-500 Frequent
0sx2 Infreqent Rank 5 10-500 Infrequent
0tb3 Often 10-30 Frequent
er04 Not Frequent Rank 3 Frequent
;
Now lets use an SQL query to make a macro variable with oldname=newname pairs that we can use to generate a RENAME statement or the RENAME= dataset option.
proc sql noprint;
select catx('='
,nliteral(name)
,nliteral(prxchange('s/^(M\d+)/$1_/',1,name))
)
into :renames separated by ' '
from dictionary.columns
where libname='WORK'
and memname='HAVE'
;
quit;
Now we can run a data step to make all three of your output datasets.
data want1(keep=id m1_: ) want2(keep=id m2_:) want11(keep=id m11_:) ;
set have(rename=(&renames));
run;
Results:
If you don't want to bother trying to learn how to use regular expressions you could use some other simpler SAS functions. Like VERIFY() and SUBSTR() to find the prefix for your variable names.
proc sql ;
create table groups as
select name
, substrn(name,1,verify(upcase(name),'M0123456789')-1) as prefix length=32
, input(substr(calculated prefix,2),32.) as group
from dictionary.columns
where libname='WORK'
and memname='HAVE'
and upcase(name) like 'M%'
order by group,name
;
quit;
Result
Obs name prefix group
1 M1r2c1 M1 1
2 M2r2c1 M2 2
3 M11rxc1 M11 11
4 M11rxc2 M11 11