I have a SAS-dataset with an overall structure similar to the one below. I want to calculate all "missing" values in the MONEY-column based on the previous month's MONEY and SPEND for each ID (student).
My data looks like this (INPUT_DATA):
ID MONTH MONEY SPEND
01 1 1000 300
01 2 . 200
01 3 . 200
01 4 . 300
02 1 2000 100
02 2 . 100
02 3 . 150
02 4 . 100
03 1 900 600
03 2 . 100
03 3 . 200
I want it to look like this (OUTPUT_DATA):
ID MONTH MONEY SPEND
01 1 1000 300
01 2 700 200
01 3 500 200
01 4 300 300
02 1 2000 100
02 2 1900 100
02 3 1800 150
02 4 1650 100
03 1 900 600
03 2 300 100
03 3 200 200
How is this possible using SAS? I tried something like this using the LAG-function:
DATA OUTPUT_DATA;
SET INPUT_DATA;
IF MONTH > 1 THEN MONEY = LAG1(MONEY) - LAG1(SPEND);
RUN;
But unfortunately, it does not return the expected output. The data can be created using this:
data INPUT_DATA;
infile datalines;
input ID MONTH MONEY SPEND;
datalines;
01 1 1000 300
01 2 . 200
01 3 . 200
01 4 . 300
02 1 2000 100
02 2 . 100
02 3 . 150
02 4 . 100
03 1 900 600
03 2 . 100
03 3 . 200
;
run;
Probably easier if you make a new variable that is retained.
data have;
input ID $ MONTH MONEY SPEND ;
cards;
01 1 1000 300
01 2 . 200
01 3 . 200
01 4 . 300
02 1 2000 100
02 2 . 100
02 3 . 150
02 4 . 100
03 1 900 600
03 2 . 100
03 3 . 200
;
data want;
set have ;
by id;
new_money = coalesce(money,new_money);
output;
new_money + -spend;
run;
Results:
new_
Obs ID MONTH MONEY SPEND money
1 01 1 1000 300 1000
2 01 2 . 200 700
3 01 3 . 200 500
4 01 4 . 300 300
5 02 1 2000 100 2000
6 02 2 . 100 1900
7 02 3 . 150 1800
8 02 4 . 100 1650
9 03 1 900 600 900
10 03 2 . 100 300
11 03 3 . 200 200
Add a DROP and RENAME statement if you want get rid of the original variable and re-use its name.
drop money;
rename new_money=money;