I have a dataset of rates by age, state, and year. Each state and year has multiple observations with only one rate per observation. I need help combining the observations so all the rates per state/year are in one observation. The dataset looks like the below:
data test_00;
input State$ year rt_15_30 rt_31_60 rt_61plus;
datalines;
AL 2010 15 . .
AL 2010 . 18 .
AL 2010 . . 20
AL 2011 18 . .
AL 2011 . 17 .
AL 2011 . . 22
MA 2010 10 . .
MA 2010 . 11 .
MA 2010 . . 12
MA 2011 11 . .
MA 2011 . 13 .
MA 2011 . . 15
;
run;
I am trying to set up my dataset to look like the below:
data test_01;
input State$ year rt_15_30 rt_31_60 rt_61plus;
datalines;
AL 2010 15 18 20
AL 2011 18 17 22
MA 2010 10 11 12
MA 2011 11 13 15
;
run;
Any guidance on this would be greatly appreciated!! Thank you in advance!
This is a perfect structure to make use of the UPDATE statement. The UPDATE statement was designed for applying transactions to an existing dataset. When the transactions are applied only the non-missing values will cause a change in the value of a variable.
The UPDATE statement requires two datasets. An original dataset that can have at most one observation per by group. And a transactions dataset that can have multiple observations per by group. The implicit OUTPUT statement at the end of the data step will only execute for the last observation in the by group so that the result also have only one observation per by group.
You can use an OBS=0 dataset option so that you can use your dataset for both; an empty version of your dataset as the original data and the complete dataset as the dataset of transactions to be applied.
data test_01;
update test_00(obs=0) test_00;
by state year;
run;