I have the following simplified by many columns table.
Person Period Cost_Center Previous_Cost_Center
----------- ---------------- ----------- ----------------------
123 2019003 1
123 2019004 1
123 2019005 2
123 2019006 2
123 2019007 3
For each Person, I would like to find his Previous Cost Center and save it in the corresponding column. The previous Cost Center has to be different from the current one.
What I would like to get:
Person Period Cost_Center Previous_Cost_Center
----------- ---------------- ----------- ---------------------
123 2019003 1 NULL
123 2019004 1 NULL
123 2019005 2 1
123 2019006 2 1 <----- Problematic row
123 2019007 3 2
Standard LAG() function over the period would actually output:
Person Period Cost_Center Previous_Cost_Center
----------- ---------------- ----------- ---------------------
123 2019003 1 NULL
123 2019004 1 NULL
123 2019005 2 1
123 2019006 2 2 <----- Problematic row
123 2019007 3 2
What I would like to have in a problematic row is the last different Cost_Center value which is 1 and not 2. What I thought of using is to check if the previous Cost_Center is different:
CASE
WHEN
LAG ( Cost_Center ) OVER ( PARTITION BY Person ORDER BY Period ) != Cost_Center
THEN
LAG ( Cost_Center ) OVER ( PARTITION BY Person ORDER BY Period )
ELSE
Previous_Cost_Center
END
AS Previous_Cost_Center,
But then I end up without a Previous_Cost_Center value in row 4 at all. How can this be archived using SQL? How can I take over the last different Cost_Center value to save it in Previous_Cost_Center sorting by Period?
You can do this with window functions, but it is trickier than necessary because the ignore null
s option is not available.
First, you want to assign a grouping to adjacent values of the same group.
Second, you want to give the first value in the grouping the previous cost center.
Third, you want to "spread" that value through the entire group.
select t.*,
max(case when immediate_prev_cost_center <> cost_center then immediate_prev_cost_center
end) over (partition by person, cost_center, (seqnum - seqnum_2)
) as prev_cost_center
from (select t.*,
row_number() over (partition by person order by period) as seqnum,
row_number() over (partition by person, cost_center order by period) as seqnum_2,
lag(cost_center) over (partition by person order by period) as immediate_prev_cost_center
from t
) t;
The difference of row numbers defines the group. The max()
spreads the previous value throughout the group.
Here is a db<>fiddle.