I am new to BQ and need to write a query that returns rows where a particular value has changed on or after 01/09/2024 and then also return the last row before that change (unless the particular value is the same as the previous most recent change). Also, I don't want to return a row if a particular second field value has changed.
Please see screenshot of the data in bigquery with some commentary added and the result I'm looking to return.
Sample data and expected result
Appreciate any assistance in how this can be achieved.
EDIT: Example dataset where UserId '987654' has no value changes across all rows - the date isn't copnsidered a changed value.
WITH
myTable AS ( (
SELECT 123456 AS UserId, DATE('2024-09-01') AS EffectiveDate, 'Supervisor' AS Role, 2000 AS PostCode, 'Suburb1' AS Suburb, 'State1' AS State, 'Country1' AS Country UNION ALL
SELECT 123456, DATE('2024-09-03'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-05'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-07'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-09'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-11'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-13'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-15'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-17'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-09'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-11'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-13'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-15'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-17'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-19'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-21'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-23'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-25'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1')
ORDER BY
UserId,
EffectiveDate )
SELECT
UserId,
EffectiveDate,
Role,
PostCode,
Suburb,
State,
Country,
FROM
myTable
QUALIFY
(PostCode != LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
OR LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)
AND (NOT Role != LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
OR LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)
Result returns a single row for UserId '987654' even no there weren't any changes across any of the rows for that user - the date isn't considered a changed value. Can users that have no changes across any rows not show in the result?
UserId EffectiveDate Role PostCode Suburb State Country
123456 2024-09-01 Supervisor 2000 Suburb1 State1 Country1
123456 2024-09-03 Supervisor 2001 Suburb1 State1 Country1
123456 2024-09-09 Supervisor 2002 Suburb2 State2 Country1
987654 2024-09-09 Leader 3000 Suburb1 State1 Country1
The following query first creates your example table (please prepare this by yourself in future questions, it would be convenient for people willing to help you), and then uses the LAG
Window Function to get your expected result.
WITH
myTable AS ( (
SELECT 123456 AS UserId, DATE('2024-09-01') AS EffectiveDate, 'Supervisor' AS Role, 2000 AS PostCode, 'Suburb1' AS Suburb, 'State1' AS State, 'Country1' AS Country UNION ALL
SELECT 123456, DATE('2024-09-03'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-05'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-07'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-09'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-11'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-13'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-15'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 123456, DATE('2024-09-17'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-09'), 'Supervisor', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-11'), 'Supervisor', 3001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-13'), 'Supervisor', 3001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-15'), 'Supervisor', 3001, 'Suburb1', 'State1', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-17'), 'Supervisor', 3002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-19'), 'Supervisor', 3002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-21'), 'Supervisor', 3002, 'Suburb2', 'State2', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-23'), 'Leader', 3003, 'Suburb3', 'State3', 'Country1' UNION ALL
SELECT 987654, DATE('2024-09-25'), 'Supervisor', 3003, 'Suburb3', 'State3', 'Country1')
ORDER BY
UserId,
EffectiveDate )
SELECT
UserId,
EffectiveDate,
Role,
PostCode,
Suburb,
State,
Country,
FROM
myTable
QUALIFY
(PostCode != LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
OR LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)
AND (NOT Role != LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
OR LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)
QUALIFY
filters the result, after the analytic functions are evaluated. The criteria after the OR
is needed to catch the first row, where LAG(Postcode)
resp. LAG(Role)
evaluate to NULL
.
The result of this query is the following:
UserId EffectiveDate Role PostCode Suburb State Country
123456 2024-09-01 Supervisor 2000 Suburb1 State1 Country1
123456 2024-09-03 Supervisor 2001 Suburb1 State1 Country1
123456 2024-09-09 Supervisor 2002 Suburb2 State2 Country1
987654 2024-09-09 Supervisor 3000 Suburb1 State1 Country1
987654 2024-09-11 Supervisor 3001 Suburb1 State1 Country1
987654 2024-09-17 Supervisor 3002 Suburb2 State2 Country1