I have a dataset which looks like this:
Account Number 6m 7m 8m 9m 10m 11m
1 Better X < 10 X < 10 Better X < 30 X < 30
2 X < 10 X < 20 X < 30 X < 20 X < 20 X < 20
3 Better Better Better Better X < 10 X < 20
4 X < 10 Better Same Same Same Same
5 Same Better Same Same Same Same
6 Same Same Same Better Better Better
7 Same X < 10 X < 10 X < 10 X < 10 Better
8 Better Better Better Better Better Better
9 X < 10 X < 10 X < 10 X < 20 X < 30 Better
10 X < 20 X < 30 X < 30 X < 30 X < 30 X < 30
Where each cell tells me what's happened 6-11 months later for each account number. I want to turn this into a dataset that I can create graphs etc from, so would like to transpose it to look like this:
Result 6m 7m 8m 9m 10m 11m
X < 10 3 3 3 1 2 0
X < 20 1 1 0 2 1 2
X < 30 0 1 1 1 2 1
Same 3 1 3 2 2 2
Better 1 2 1 2 2 4
Even better if there was a way to turn the count into a % for each column.
data have;
infile datalines dlm='|';
input "Account Number"n "6m"n$ "7m"n$ "8m"n$ "9m"n$ "10m"n$ "11m"n$;
datalines;
1|Better|X < 10|X < 10|Better|X < 30|X < 30
2|X < 10|X < 20|X < 30|X < 20|X < 20|X < 20
3|Better|Better|Better|Better|X < 10|X < 20
4|X < 10|Better|Same|Same|Same|Same
5|Same|Better|Same|Same|Same|Same
6|Same|Same|Same|Better|Better|Better
7|Same|X < 10|X < 10|X < 10|X < 10|Better
8|Better|Better|Better|Better|Better|Better
9| X < 10|X < 10|X < 10|X < 20|X < 30|Better
10| X < 20|X < 30|X < 30|X < 30|X < 30|X < 30
;
run;
First, stack the data so we can do some counting:
data stack;
set have;
array charvars[*] _CHARACTER_;
do i = 1 to dim(charvars);
result = charvars[i];
var = vname(charvars[i]);
output;
end;
keep result var;
run;
This gets you:
result var
Better 6m
X < 10 7m
X < 10 8m
Better 9m
X < 30 10m
X < 30 11m
... ...
I am certain with this data you can do something really cool with proc report
, but that's not an area I know particularly well. Instead, we'll create the dataset in a few other steps.
We can collapse this and count the number of values within each result, var
combination, then calculate a percentage of each var
within that:
proc sql;
create table pct as
select result, var, total, total / sum(total) as pct format=percent8.1
from (select result, var, count(*) as total
from stack
group by result, var
)
group by var
order by result, var
;
quit;
Which gets us this:
result var total pct
Better 10m 2 20.0%
Better 11m 4 40.0%
Better 6m 3 30.0%
Better 7m 4 40.0%
Better 8m 2 20.0%
Better 9m 4 40.0%
... ... ... ...
Now we have everything we need to transpose it into the format that we want. The id
statement in proc transpose
will allow us to use var
as the name of each transposed column. We'll do this by result
.
proc transpose data=pct out=pct_tpose(drop=_NAME_);
by result;
id var;
var pct;
run;
Which gets us almost what we want:
result 10m 11m 6m 7m 8m 9m
Better 20.0% 40.0% 30.0% 40.0% 20.0% 40.0%
Same 20.0% 20.0% 30.0% 10.0% 30.0% 20.0%
X < 10 20.0% . 30.0% 30.0% 30.0% 10.0%
X < 20 10.0% 20.0% 10.0% 10.0% . 20.0%
X < 30 30.0% 20.0% . 10.0% 20.0% 10.0%
Now we just need to clean it on up by:
result
to the desired order/* Replace missing with 0 */
proc stdize data=pct_tpose
out=want
missing=0
reponly;
run;
/* Fix sort order */
data want_sorted;
/* Set variable order */
length Result $10.
"6m"n "7m"n "8m"n "9m"n "10m"n "11m"n 8.
;
set want;
select(result);
when('X < 10') order = 1;
when('X < 20') order = 2;
when('X < 30') order = 3;
when('Same') order = 4;
otherwise order = 5;
end;
run;
proc sort data=want_sorted out=want_sorted_final(drop=order);
by order;
run;
Which gets us our final result that we want:
Result 6m 7m 8m 9m 10m 11m
X < 10 30.0% 30.0% 30.0% 10.0% 20.0% 0.0%
X < 20 10.0% 10.0% 0.0% 20.0% 10.0% 20.0%
X < 30 0.0% 10.0% 20.0% 10.0% 30.0% 20.0%
Same 30.0% 10.0% 30.0% 20.0% 20.0% 20.0%
Better 30.0% 40.0% 20.0% 40.0% 20.0% 40.0%