saspercentagesas-studio

How to divide all the observations based on a sum of a column


I'm trying to do simple calculations but I'm new and SAS is not intuitive to me.

Suppose I have this table.

data money;
infile datalines delimiter=",";
input name $ return $ invested;
datalines;
Joe,10,100
Bob,7,50
Mary,80,1000
;

Which creates this

/* name | return | invested */
/* _________________________ */
/* Joe  | 10     | 100 */
/* Bob  | 7      | 50 */
/* Mary | 80     | 50 */

I have three things I would like to do for my job that just switched over to SAS.

  1. I need to make sure columns return and invested are numeric. When I run the code above, return column ends up being a CHAR column and I don't know why.

  2. Now I want to create a new column and calculate the share of the total return they each got. In this case, the sum of return=97. This is the result I want.

/* name     | return | invested | share_of_return */
/* ____________________________________________ */
/* Joe      |  10   | 100       | 10.30% */
/* Bob      |   7   | 50        | 7.22% */
/* Mary     | 80    | 50        | 82.47% */
  1. Next I want to find their ROI. Which is (return-investment) / investment * 100. This is the result I am looking for
/* Find ROI */

/* name | return | invested | share_of_return | ROI */
/* ___________________________________________________ */
/* Joe  | 10     | 100      | 10.30%          | -90% */
/* Bob  | 7      | 50       | 7.22%           | -86% */
/* Mary | 80     | 50       | 82.47%          | 60% */

I appreciate your explanations and guidance in advanced. This is for a work project and we just switched over to SAS


Solution

  • 1 & 3 are easy, 2 is slightly more difficult.

    1. Remove $ in INPUT statement. $ indicates character. In your data you may need to convert it using the input function instead though.

    Fix for example:

    input name $ return  invested;
    

    Fix for actual data using input function. Note that you cannot convert types in a data step to the same name so I rename it while reading it in using the rename data set option.

    data money2;
    set money (rename = return = return_char);
    return = input(return_char, best.);
    drop return_char;
    run;
    
    1. Add total value to data step, SQL is fastest here:
    proc sql;
    create table money3 as
    select *, sum(return) as return_total, return/calculated return_total as return_percentage f=percent12.1
    from money2;
    quit;
    

    I outline two different methods of doing this here

    1. Within a data step, add your calculation. It's probably most efficient if it can be done in first step. Since a data step loops automatically you write the formula pretty much as shown. In this case I've also applied a format so it shows as a percentage but that requires you to not multiply it by 100. Depending on what you're doing next it may be best to leave it as numeric.
    data money2;
    set money (rename = return = return_char);
    return = input(return_char, best.);
    ROI = (return - investment)/investment;
    format ROI percent12.1;
    run;
    drop return_char;
    run;