sascopying

How to copy a variable to blank observations by other variables


I am attempting to copy data into blank rows based on the ID date and hour. I am having trouble getting this to apply based on examples I have seen on this forum. My extra layer of difficulty comes from each ID could need two different locations copied to it. Some example data below...

Obs ID     Date                   Hour Location    Assignment Date
1        260   08May2023    0                             07May2023
2       260   08May2023    1         A                   07May2023 
3       260   08May2023    2        A                   07May2023
4       260   08May2023    3                              07May2023
..........................................................................................................
15       260    08May2023   4        B                   10May2023
16       260    08May2023   5                              10May2023
17       300    08May2023   0                               03May2023
18       300    08May2023  1                                  03May2023
19       300    08May2023  2       C                      03May2023
10      300    08May2023  3        C                      03May2023
11      300    08May2023  4        D                       08May2023
12     300    08May2023  5                                   08May2023

The code I have used is this

proc sort data = have;
    by ID Date Hour;
run;

data want;
    set have;
    retain pcol1;

    if missing(Location) then Location= pcol1;
    pcol1 = Location;

    drop pcol:;
 run;

The problem with this code is observation 17 & 18 will copy above making it B when it should be C. Same for observation 1 if it has data above it. Also you can see there are occurrences where one ID will have two locations in the dataset (Which is done by week). Most will be in one location the entire week but it does happen where an ID can be in one location then have a gap and then reappear at a new location. The code I have tried above works except when there is an observation that has location blank above it. What code can be used to copy the cells with data for location into the cells missing location by ID and Assignment date in order of the date and hours. I say in order because the new assignment date is when you know the location has changed. For clarity this is how the date should fill in:

Obs ID     Date                   Hour Location    Assignment Date
1        260   08May2023    0       A                      07May2023
2       260   08May2023    1         A                   07May2023 
3       260   08May2023    2        A                   07May2023
4       260   08May2023    3       A                      07May2023
..........................................................................................................
15       260    08May2023   4       B                     10May2023
16       260    08May2023   5       B                      10May2023
17       300    08May2023   0     C                        03May2023
18       300    08May2023  1       C                          03May2023
19       300    08May2023  2      C                      03May2023
10      300    08May2023  3       C                      03May2023
11      300    08May2023  4        D                       08May2023
12     300    08May2023  5        D                          08May2023

Thank you in advance for anyone able to help me with this.


Solution

  • Since the location changes when assignment date changes, you can change the sort order and perform Last Observation Carried Forward (LOCF). Sort it by id date Assignment Date descending location to put all of the valid locations first:

    proc sort data=have;
        by id date Assignment_Date descending location;
    run;
    
    ID  Date        Hour    Location    Assignment_Date
    260 08MAY2023   1       A           07MAY2023
    260 08MAY2023   2       A           07MAY2023
    260 08MAY2023   0                   07MAY2023
    260 08MAY2023   3                   07MAY2023
    260 08MAY2023   4       B           10MAY2023
    260 08MAY2023   5                   10MAY2023
    300 08MAY2023   2       C           03MAY2023
    300 08MAY2023   3       C           03MAY2023
    300 08MAY2023   0                   03MAY2023
    300 08MAY2023   1                   03MAY2023
    300 08MAY2023   4       D           08MAY2023
    300 08MAY2023   5                   08MAY2023
    

    Now we just use retain logic to carry the last observation forward, then resort it back to the desired order. We shouldn't need to worry about by-group logic because we're ensuring that there's always a valid value of location first within each group of id date. If there is not always a valid value of location, then you'll need to account for this.

    data want;
        set have;
        retain _location;
    
        if(NOT missing(location)) then _location = location;
            else location = _location;
    
        drop _location;
    run;
    
    proc sort data=want;
        by ID Date Hour;
    run;
    
    ID  Date        Hour    Location    Assignment_Date
    260 08MAY2023   0       A           07MAY2023
    260 08MAY2023   1       A           07MAY2023
    260 08MAY2023   2       A           07MAY2023
    260 08MAY2023   3       A           07MAY2023
    260 08MAY2023   4       B           10MAY2023
    260 08MAY2023   5       B           10MAY2023
    300 08MAY2023   0       C           03MAY2023
    300 08MAY2023   1       C           03MAY2023
    300 08MAY2023   2       C           03MAY2023
    300 08MAY2023   3       C           03MAY2023
    300 08MAY2023   4       D           08MAY2023
    300 08MAY2023   5       D           08MAY2023