I have a SQL Server system-versioned table with ~20 columns, all of which hold changing values over time. I am trying to get the changed values for only a subset of the columns along with their respective validity columns.
Some example data:
Name | Company | Location | SysStartTime | SysEndTime |
---|---|---|---|---|
Employee1 | Company A | New York | 2023-11-23 05:28:46.9571214 | 2023-12-07 05:20:40.7315348 |
Employee1 | Company A | San Francisco | 2023-12-07 05:20:40.7315348 | 2024-01-26 05:13:37.1539216 |
Employee1 | Company B | Berlin | 2024-01-26 05:13:37.1539216 | 2024-01-27 05:13:28.0830253 |
Employee1 | Company A | Tokyo | 2024-01-27 05:13:28.0830253 | 2024-03-09 05:12:29.7629149 |
Employee1 | Company A | Rome | 2024-03-09 05:12:29.7629149 | 2024-04-13 04:10:13.4617646 |
Employee1 | Company A | Kinshasa | 2024-04-13 04:10:13.4617646 | 9999-12-31 23:59:59.9999999 |
Employee2 | Company A | Newtown | 2023-11-23 05:28:46.9571214 | 2024-01-26 05:13:37.1539216 |
Employee2 | Company A | Oldtown | 2024-01-26 05:13:37.1539216 | 2024-04-13 04:10:13.4617646 |
Employee2 | Company C | Towntown | 2024-04-13 04:10:13.4617646 | 9999-12-31 23:59:59.9999999 |
And the desired output:
Name | Company | SysStartTime | SysEndTime |
---|---|---|---|
Employee1 | Company A | 2023-11-23 05:28:46.9571214 | 2024-01-26 05:13:37.1539216 |
Employee1 | Company B | 2024-01-26 05:13:37.1539216 | 2024-01-27 05:13:28.0830253 |
Employee1 | Company A | 2024-01-27 05:13:28.0830253 | 9999-12-31 23:59:59.9999999 |
Employee2 | Company A | 2023-11-23 05:28:46.9571214 | 2024-04-13 04:10:13.4617646 |
Employee2 | Company C | 2024-04-13 04:10:13.4617646 | 9999-12-31 23:59:59.9999999 |
Basically, I want to get the validity periods for column Name and Company while ignoring the changes in Location but using the Sysstartime/sysendtime values created when changing location.
It is possible that I will need additional columns ignoring changes to undesired columns.
What query would give me the desired result?
Use Self-join in first CTE to identify changes in Name
. In second CTE
groups consecutive records. and finally consolidates them, calculating the earliest start and latest end times for each continuous employment period per company
:
WITH Previous AS (
SELECT
curr.Name,
curr.Company,
curr.SysStartTime,
curr.SysEndTime,
prev.Company AS PrevCompany
FROM EmployeeHistory curr
LEFT JOIN EmployeeHistory prev
ON curr.Name = prev.Name
AND curr.SysStartTime = prev.SysEndTime
)
, Grouped AS (
SELECT *,
SUM(CASE WHEN Company <> PrevCompany OR PrevCompany IS NULL THEN 1 ELSE 0 END)
OVER (PARTITION BY Name ORDER BY SysStartTime ROWS UNBOUNDED PRECEDING) AS GroupID
FROM Previous
)
SELECT
Name,
Company,
MIN(SysStartTime) AS SysStartTime,
MAX(SysEndTime) AS SysEndTime
FROM Grouped
GROUP BY Name, Company, GroupID
ORDER BY Name, SysStartTime;