I would like to turn the following MWE into a data step (possibly using a macro) so I can run it with six years worth of data, and five or six columns of data. Here the years are pre-defined variables.
PROC SQL;
CREATE TABLE SASUSER.OUT_DATA AS
SELECT &YEAR1 AS YEAR,
ID,
ROUND(ANN_TURN&YEAR1) AS ANN_TURN,
ROUND(ANN_PROF&YEAR1) AS ANN_PROF
FROM WORK.IN_DATA
WHERE ANN_TURN&YEAR1 IS NOT MISSING OR
ANN_PROF&YEAR1 IS NOT MISSING
UNION
SELECT &YEAR2 AS YEAR,
ID,
ROUND(ANN_TURN&YEAR2) AS ANN_TURN,
ROUND(ANN_PROF&YEAR2) AS ANN_PROF
FROM WORK.IN_DATA
WHERE ANN_TURN&YEAR2 IS NOT MISSING OR
ANN_PROF&YEAR2 IS NOT MISSING
UNION
SELECT &YEAR3 AS YEAR,
ID,
ROUND(ANN_TURN&YEAR3) AS ANN_TURN,
ROUND(ANN_PROF&YEAR3) AS ANN_PROF
FROM WORK.IN_DATA
WHERE ANN_TURN&YEAR3 IS NOT MISSING OR
ANN_PROF&YEAR3 IS NOT MISSING;
QUIT;
Example of IN_DATA (apologies I'm not sure how to add a table in correct format):
ID ANN_TURN_Y1819 ANN_TURN_Y1920 ANN_TURN_Y2021 ANN_PROF_Y1819 ANN_PROF_Y1920 ANN_PROF_Y2021
1 100 110 120 10 12 15
The DATA Step SET
statement is used to stack data sets.
However, your SQL is performing a UNION of selections from the same data set, essentially reshaping the data through a transposition.
If you are ROUNDing for display purposes, use a FORMAT instead. The WHERE clause might be suspect. OR'ing double negatives always made this old head wobble.
Example:
You can do this in a non-macro way using VVALUEX
to extract the value of a dynamically named variable.
data have;
do companyid = 1 to 1000;
array ann_turn ann_turn1920-ann_turn2025;
array ann_prof ann_prof1920-ann_prof2025;
do over ann_turn ;
x + 1; ann_turn = x;
y + 10; ann_prof = y;
end;
output;
end;
keep companyid ann_turn: ann_prof: ;
run;
%let years_list = 1920, 1940, 1961, 1962, 1991 to 1995, 2009;
data want(keep=companyid year ann_turn ann_prof);
set have;
do year = &years_list;
ann_turn = input(vvaluex(cats('ANN_TURN', year)),best16.);
ann_prof = input(vvaluex(cats('ANN_PROF', year)),best16.);
output;
end;
format ann_: dollar 15.;
run;