sasdatastep

Calculate row values based on previous calulated rows SAS


I have a SAS-dataset with an overall structure similar to the one below. I want to calculate all "missing" values in the MONEY-column based on the previous month's MONEY and SPEND for each ID (student).

My data looks like this (INPUT_DATA):

ID        MONTH       MONEY      SPEND  
01            1        1000        300
01            2           .        200
01            3           .        200
01            4           .        300
02            1        2000        100
02            2           .        100
02            3           .        150
02            4           .        100
03            1         900        600
03            2           .        100
03            3           .        200

I want it to look like this (OUTPUT_DATA):

ID        MONTH       MONEY      SPEND  
01            1        1000        300
01            2         700        200
01            3         500        200
01            4         300        300
02            1        2000        100
02            2        1900        100
02            3        1800        150
02            4        1650        100
03            1         900        600
03            2         300        100
03            3         200        200

How is this possible using SAS? I tried something like this using the LAG-function:

DATA OUTPUT_DATA;
    SET INPUT_DATA;
    IF MONTH > 1 THEN MONEY = LAG1(MONEY) - LAG1(SPEND);
RUN;

But unfortunately, it does not return the expected output. The data can be created using this:

data INPUT_DATA;
  infile datalines;
  input ID MONTH MONEY SPEND;
datalines;
01            1        1000        300
01            2           .        200
01            3           .        200
01            4           .        300
02            1        2000        100
02            2           .        100
02            3           .        150
02            4           .        100
03            1         900        600
03            2           .        100
03            3           .        200
;
run;

Solution

  • Probably easier if you make a new variable that is retained.

    data have;
      input ID $ MONTH MONEY SPEND  ;
    cards;
    01 1 1000 300
    01 2    . 200
    01 3    . 200
    01 4    . 300
    02 1 2000 100
    02 2    . 100
    02 3    . 150
    02 4    . 100
    03 1  900 600
    03 2    . 100
    03 3    . 200
    ;
    
    data want;
      set have ;
      by id;
      new_money = coalesce(money,new_money);
      output;
      new_money + -spend;
    run;
    

    Results:

                                             new_
    Obs    ID    MONTH    MONEY    SPEND    money
    
      1    01      1       1000     300      1000
      2    01      2          .     200       700
      3    01      3          .     200       500
      4    01      4          .     300       300
      5    02      1       2000     100      2000
      6    02      2          .     100      1900
      7    02      3          .     150      1800
      8    02      4          .     100      1650
      9    03      1        900     600       900
     10    03      2          .     100       300
     11    03      3          .     200       200
    

    Add a DROP and RENAME statement if you want get rid of the original variable and re-use its name.

    drop money;
    rename new_money=money;