Currently I have a very big table that have a snapshot of data for each month.
ID | other | Team | Period |
---|---|---|---|
1 | ..... | A | 2020-04-30 |
1 | ..... | A | 2020-05-31 |
1 | ..... | A | 2020-06-30 |
1 | ..... | A | 2020-07-31 |
1 | ..... | B | 2020-08-31 |
1 | ..... | B | 2020-09-30 |
1 | ..... | C | 2020-10-31 |
1 | ..... | C | 2020-11-30 |
1 | ..... | C | 2020-12-31 |
I would like to implement SCD on this table based on comparison of all listed columns. How Can I achieve it?
In this example I would like to have only 3 rows where Team column was changed and proper valid_from, valid_to columns based on period column.
If you can get those who built the monthly snapshot to do otherwise, encourage them strongly to do so.
For the rest: in one query, get the previous value for team
, as prev_team
- I do that in a WITH clause, and name the Common table Expression w_prev
.
Then I run an outer query on w_prev
, filtering by the previous team
being NULL or different from the current team
value - and use the three-parameter variant of the LEAD()
OLAP function to get the next date, or, if missing, a maximum possible date:
WITH
-- your input, don't use in final query ..
indata(id,other,team,period) AS (
SELECT 1,'.....','A',CAST('2020-04-30' AS DATE)
UNION ALL SELECT 1,'.....','A',CAST('2020-05-31' AS DATE)
UNION ALL SELECT 1,'.....','A',CAST('2020-06-30' AS DATE)
UNION ALL SELECT 1,'.....','A',CAST('2020-07-31' AS DATE)
UNION ALL SELECT 1,'.....','B',CAST('2020-08-31' AS DATE)
UNION ALL SELECT 1,'.....','B',CAST('2020-09-30' AS DATE)
UNION ALL SELECT 1,'.....','C',CAST('2020-10-31' AS DATE)
UNION ALL SELECT 1,'.....','C',CAST('2020-11-30' AS DATE)
UNION ALL SELECT 1,'.....','C',CAST('2020-12-31' AS DATE)
)
-- end of input, replace following comma with "WITH" ..
,
w_prev AS (
SELECT
*
, LAG(team) OVER(PARTITION BY id ORDER BY period) AS prev_team
FROM indata
)
SELECT
id
, other
, team
, period
, LEAD(period,1,CAST('9999-12-01' AS DATE)) OVER(PARTITION BY id ORDER BY period) AS to_period
FROM w_prev
WHERE prev_team IS NULL
OR prev_team <> team
;
id | other | team | period | to_period |
---|---|---|---|---|
1 | ..... | A | 2020-04-30 | 2020-08-31 |
1 | ..... | B | 2020-08-31 | 2020-10-31 |
1 | ..... | C | 2020-10-31 | 9999-12-01 |