I am attempting to copy data into blank rows based on the ID date and hour. I am having trouble getting this to apply based on examples I have seen on this forum. My extra layer of difficulty comes from each ID could need two different locations copied to it. Some example data below...
Obs ID Date Hour Location Assignment Date
1 260 08May2023 0 07May2023
2 260 08May2023 1 A 07May2023
3 260 08May2023 2 A 07May2023
4 260 08May2023 3 07May2023
..........................................................................................................
15 260 08May2023 4 B 10May2023
16 260 08May2023 5 10May2023
17 300 08May2023 0 03May2023
18 300 08May2023 1 03May2023
19 300 08May2023 2 C 03May2023
10 300 08May2023 3 C 03May2023
11 300 08May2023 4 D 08May2023
12 300 08May2023 5 08May2023
The code I have used is this
proc sort data = have;
by ID Date Hour;
run;
data want;
set have;
retain pcol1;
if missing(Location) then Location= pcol1;
pcol1 = Location;
drop pcol:;
run;
The problem with this code is observation 17 & 18 will copy above making it B when it should be C. Same for observation 1 if it has data above it. Also you can see there are occurrences where one ID will have two locations in the dataset (Which is done by week). Most will be in one location the entire week but it does happen where an ID can be in one location then have a gap and then reappear at a new location. The code I have tried above works except when there is an observation that has location blank above it. What code can be used to copy the cells with data for location into the cells missing location by ID and Assignment date in order of the date and hours. I say in order because the new assignment date is when you know the location has changed. For clarity this is how the date should fill in:
Obs ID Date Hour Location Assignment Date
1 260 08May2023 0 A 07May2023
2 260 08May2023 1 A 07May2023
3 260 08May2023 2 A 07May2023
4 260 08May2023 3 A 07May2023
..........................................................................................................
15 260 08May2023 4 B 10May2023
16 260 08May2023 5 B 10May2023
17 300 08May2023 0 C 03May2023
18 300 08May2023 1 C 03May2023
19 300 08May2023 2 C 03May2023
10 300 08May2023 3 C 03May2023
11 300 08May2023 4 D 08May2023
12 300 08May2023 5 D 08May2023
Thank you in advance for anyone able to help me with this.
Since the location changes when assignment date changes, you can change the sort order and perform Last Observation Carried Forward (LOCF). Sort it by id date Assignment Date descending location
to put all of the valid locations first:
proc sort data=have;
by id date Assignment_Date descending location;
run;
ID Date Hour Location Assignment_Date
260 08MAY2023 1 A 07MAY2023
260 08MAY2023 2 A 07MAY2023
260 08MAY2023 0 07MAY2023
260 08MAY2023 3 07MAY2023
260 08MAY2023 4 B 10MAY2023
260 08MAY2023 5 10MAY2023
300 08MAY2023 2 C 03MAY2023
300 08MAY2023 3 C 03MAY2023
300 08MAY2023 0 03MAY2023
300 08MAY2023 1 03MAY2023
300 08MAY2023 4 D 08MAY2023
300 08MAY2023 5 08MAY2023
Now we just use retain logic to carry the last observation forward, then resort it back to the desired order. We shouldn't need to worry about by-group logic because we're ensuring that there's always a valid value of location first within each group of id date
. If there is not always a valid value of location, then you'll need to account for this.
data want;
set have;
retain _location;
if(NOT missing(location)) then _location = location;
else location = _location;
drop _location;
run;
proc sort data=want;
by ID Date Hour;
run;
ID Date Hour Location Assignment_Date
260 08MAY2023 0 A 07MAY2023
260 08MAY2023 1 A 07MAY2023
260 08MAY2023 2 A 07MAY2023
260 08MAY2023 3 A 07MAY2023
260 08MAY2023 4 B 10MAY2023
260 08MAY2023 5 B 10MAY2023
300 08MAY2023 0 C 03MAY2023
300 08MAY2023 1 C 03MAY2023
300 08MAY2023 2 C 03MAY2023
300 08MAY2023 3 C 03MAY2023
300 08MAY2023 4 D 08MAY2023
300 08MAY2023 5 D 08MAY2023