I am trying to add the populations of a few age groups (40-44, 45-49, 50-55) together by state and year to create a greater than 40 population. I cannot figure out how to do this within sas. Any help would be appreciated!
Sample data:
data have;
input state$ year age_grp$ pop;
datalines;
Alabama 2010 40-44 12345
Alabama 2010 45-49 23456
Alabama 2010 50-54 13452
Alabama 2011 40-44 22342
Alabama 2011 45-49 11234
Alabama 2011 50-54 43213
Arizona 2010 40-44 11111
Arizona 2010 45-49 22222
Arizona 2010 50-54 11221
Arizona 2011 40-44 12312
Arizona 2011 45-49 21212
Arizona 2011 50-54 12312
;
run;
I am trying to do this within the datastep and create a gt40_pop variable that I can use while keeping the original data for subsequent analyses.
Thank you in advance!
The most efficient way to do this would be to create a format and apply it when needed. You can then sum up your variables with something like PROC MEANS or PROC SQL to create those sums without needing to create any extra variables. For example:
proc format;
value $fortyplus
'0-4' = '0-4'
'5-9' = '5-9'
'10-14' = '10-14'
'15-19' = '15-19'
'20-24' = '20-24'
'25-29' = '25-29'
'30-34' = '30-34'
'35-39' = '35-39'
other = '40+'
;
run;
Now you can apply it in PROCS or in SQL:
proc means data=have noprint;
format age_grp $fortyplus.;
class state year age_grp;
ways 3;
output out=want
sum(pop) = total_pop
;
run;
Or:
proc sql;
create table want as
select state
, year
, put(age_grp, $fortyplus.) as age_grp
, sum(pop) as total_pop
from have
group by state, year, calculated age_grp
;
quit;
state year age_grp total_pop
Alabama 2010 40+ 49253
Alabama 2011 40+ 76789
Arizona 2010 40+ 44554
Arizona 2011 40+ 45836