datatablesasdatasettransposefrequency

Transpose Multiple Columns in SAS


I have a dataset which looks like this:

Account Number  6m      7m      8m      9m      10m     11m
1               Better  X < 10  X < 10  Better  X < 30  X < 30
2               X < 10  X < 20  X < 30  X < 20  X < 20  X < 20
3               Better  Better  Better  Better  X < 10  X < 20
4               X < 10  Better  Same    Same    Same    Same
5               Same    Better  Same    Same    Same    Same
6               Same    Same    Same    Better  Better  Better
7               Same    X < 10  X < 10  X < 10  X < 10  Better
8               Better  Better  Better  Better  Better  Better
9               X < 10  X < 10  X < 10  X < 20  X < 30  Better
10              X < 20  X < 30  X < 30  X < 30  X < 30  X < 30

Where each cell tells me what's happened 6-11 months later for each account number. I want to turn this into a dataset that I can create graphs etc from, so would like to transpose it to look like this:

Result  6m  7m  8m  9m  10m 11m
X < 10  3   3   3   1   2   0
X < 20  1   1   0   2   1   2
X < 30  0   1   1   1   2   1
Same    3   1   3   2   2   2
Better  1   2   1   2   2   4

Even better if there was a way to turn the count into a % for each column.

data have;
    infile datalines dlm='|';
    input "Account Number"n "6m"n$ "7m"n$ "8m"n$ "9m"n$ "10m"n$ "11m"n$;
    datalines;
1|Better|X < 10|X < 10|Better|X < 30|X < 30
2|X < 10|X < 20|X < 30|X < 20|X < 20|X < 20
3|Better|Better|Better|Better|X < 10|X < 20
4|X < 10|Better|Same|Same|Same|Same
5|Same|Better|Same|Same|Same|Same
6|Same|Same|Same|Better|Better|Better
7|Same|X < 10|X < 10|X < 10|X < 10|Better
8|Better|Better|Better|Better|Better|Better
9| X < 10|X < 10|X < 10|X < 20|X < 30|Better
10| X < 20|X < 30|X < 30|X < 30|X < 30|X < 30
;
run;

Solution

  • First, stack the data so we can do some counting:

    data stack;
        set have;
        array charvars[*] _CHARACTER_;
    
        do i = 1 to dim(charvars);
            result = charvars[i];
            var    = vname(charvars[i]);
            output;
        end;
    
        keep result var;
    run;
    

    This gets you:

    result  var
    Better  6m
    X < 10  7m
    X < 10  8m
    Better  9m
    X < 30  10m
    X < 30  11m
    ...     ...
    

    I am certain with this data you can do something really cool with proc report, but that's not an area I know particularly well. Instead, we'll create the dataset in a few other steps.

    We can collapse this and count the number of values within each result, var combination, then calculate a percentage of each var within that:

    proc sql;
        create table pct as
            select result, var, total, total / sum(total) as pct format=percent8.1
                from (select result, var, count(*) as total
                      from stack
                      group by result, var
                     )
                group by var
                order by result, var
        ;
    quit;
    

    Which gets us this:

    result  var total pct
    Better  10m 2     20.0%
    Better  11m 4     40.0%
    Better  6m  3     30.0%
    Better  7m  4     40.0%
    Better  8m  2     20.0%
    Better  9m  4     40.0%
    ...     ... ... ...
    

    Now we have everything we need to transpose it into the format that we want. The id statement in proc transpose will allow us to use var as the name of each transposed column. We'll do this by result.

    proc transpose data=pct out=pct_tpose(drop=_NAME_);
        by result;
        id var;
        var pct;
    run;
    

    Which gets us almost what we want:

    result  10m     11m     6m       7m     8m      9m
    Better  20.0%   40.0%   30.0%   40.0%   20.0%   40.0%
    Same    20.0%   20.0%   30.0%   10.0%   30.0%   20.0%
    X < 10  20.0%   .       30.0%   30.0%   30.0%   10.0%
    X < 20  10.0%   20.0%   10.0%   10.0%   .       20.0%
    X < 30  30.0%   20.0%   .       10.0%   20.0%   10.0%
    

    Now we just need to clean it on up by:

    1. Filling in missing values with 0
    2. Reordering columns to the desired order
    3. Reordering result to the desired order
    /* Replace missing with 0 */
    proc stdize data=pct_tpose 
                out=want 
                missing=0 
                reponly;
    run;
    
    /* Fix sort order */
    data want_sorted;
        
        /* Set variable order */
        length Result $10.
               "6m"n "7m"n "8m"n "9m"n "10m"n "11m"n 8.
        ;
    
        set want;
        
        select(result);
            when('X < 10') order = 1;
            when('X < 20') order = 2;
            when('X < 30') order = 3;
            when('Same')   order = 4;
            otherwise      order = 5;
        end;
    run;
    
    proc sort data=want_sorted out=want_sorted_final(drop=order);
        by order;
    run;
    

    Which gets us our final result that we want:

    Result  6m      7m      8m      9m      10m     11m
    X < 10  30.0%   30.0%   30.0%   10.0%   20.0%   0.0%
    X < 20  10.0%   10.0%   0.0%    20.0%   10.0%   20.0%
    X < 30  0.0%    10.0%   20.0%   10.0%   30.0%   20.0%
    Same    30.0%   10.0%   30.0%   20.0%   20.0%   20.0%
    Better  30.0%   40.0%   20.0%   40.0%   20.0%   40.0%