sqlgoogle-bigquery

How to write a BigQuery query to find rows where a value has changed but also return the previous row before that change


I am new to BQ and need to write a query that returns rows where a particular value has changed on or after 01/09/2024 and then also return the last row before that change (unless the particular value is the same as the previous most recent change). Also, I don't want to return a row if a particular second field value has changed.

Please see screenshot of the data in bigquery with some commentary added and the result I'm looking to return.

Sample data and expected result

Appreciate any assistance in how this can be achieved.

EDIT: Example dataset where UserId '987654' has no value changes across all rows - the date isn't copnsidered a changed value.

WITH
  myTable AS ( (
    SELECT 123456 AS UserId, DATE('2024-09-01') AS EffectiveDate, 'Supervisor' AS Role, 2000 AS PostCode, 'Suburb1' AS Suburb, 'State1' AS State, 'Country1' AS Country UNION ALL
    SELECT 123456, DATE('2024-09-03'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 123456, DATE('2024-09-05'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 123456, DATE('2024-09-07'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 123456, DATE('2024-09-09'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
    SELECT 123456, DATE('2024-09-11'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
    SELECT 123456, DATE('2024-09-13'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
    SELECT 123456, DATE('2024-09-15'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
    SELECT 123456, DATE('2024-09-17'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-09'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-11'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-13'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-15'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-17'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-19'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-21'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-23'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
    SELECT 987654, DATE('2024-09-25'), 'Leader', 3000, 'Suburb1', 'State1', 'Country1')
  ORDER BY
    UserId,
    EffectiveDate )
SELECT
  UserId,
  EffectiveDate,
  Role,
  PostCode,
  Suburb,
  State,
  Country,
FROM
  myTable
QUALIFY
  (PostCode != LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
    OR LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)
  AND (NOT Role != LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
    OR LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)

Result returns a single row for UserId '987654' even no there weren't any changes across any of the rows for that user - the date isn't considered a changed value. Can users that have no changes across any rows not show in the result?

UserId  EffectiveDate   Role        PostCode    Suburb  State   Country
123456  2024-09-01      Supervisor  2000        Suburb1 State1  Country1
123456  2024-09-03      Supervisor  2001        Suburb1 State1  Country1
123456  2024-09-09      Supervisor  2002        Suburb2 State2  Country1
987654  2024-09-09      Leader      3000        Suburb1 State1  Country1

Solution

  • The following query first creates your example table (please prepare this by yourself in future questions, it would be convenient for people willing to help you), and then uses the LAG Window Function to get your expected result.

    WITH
      myTable AS ( (
        SELECT 123456 AS UserId, DATE('2024-09-01') AS EffectiveDate, 'Supervisor' AS Role, 2000 AS PostCode, 'Suburb1' AS Suburb, 'State1' AS State, 'Country1' AS Country UNION ALL
        SELECT 123456, DATE('2024-09-03'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
        SELECT 123456, DATE('2024-09-05'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
        SELECT 123456, DATE('2024-09-07'), 'Supervisor', 2001, 'Suburb1', 'State1', 'Country1' UNION ALL
        SELECT 123456, DATE('2024-09-09'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 123456, DATE('2024-09-11'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 123456, DATE('2024-09-13'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 123456, DATE('2024-09-15'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 123456, DATE('2024-09-17'), 'Supervisor', 2002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-09'), 'Supervisor', 3000, 'Suburb1', 'State1', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-11'), 'Supervisor', 3001, 'Suburb1', 'State1', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-13'), 'Supervisor', 3001, 'Suburb1', 'State1', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-15'), 'Supervisor', 3001, 'Suburb1', 'State1', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-17'), 'Supervisor', 3002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-19'), 'Supervisor', 3002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-21'), 'Supervisor', 3002, 'Suburb2', 'State2', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-23'), 'Leader', 3003, 'Suburb3', 'State3', 'Country1' UNION ALL
        SELECT 987654, DATE('2024-09-25'), 'Supervisor', 3003, 'Suburb3', 'State3', 'Country1')
      ORDER BY
        UserId,
        EffectiveDate )
    SELECT
      UserId,
      EffectiveDate,
      Role,
      PostCode,
      Suburb,
      State,
      Country,
    FROM
      myTable
    QUALIFY
      (PostCode != LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
        OR LAG(Postcode) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)
      AND (NOT Role != LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate)
        OR LAG(Role) OVER (PARTITION BY UserId ORDER BY EffectiveDate) IS NULL)
    

    QUALIFY filters the result, after the analytic functions are evaluated. The criteria after the OR is needed to catch the first row, where LAG(Postcode) resp. LAG(Role) evaluate to NULL.

    The result of this query is the following:

    UserId  EffectiveDate   Role        PostCode    Suburb  State   Country
    123456  2024-09-01      Supervisor  2000        Suburb1 State1  Country1
    123456  2024-09-03      Supervisor  2001        Suburb1 State1  Country1
    123456  2024-09-09      Supervisor  2002        Suburb2 State2  Country1
    987654  2024-09-09      Supervisor  3000        Suburb1 State1  Country1
    987654  2024-09-11      Supervisor  3001        Suburb1 State1  Country1
    987654  2024-09-17      Supervisor  3002        Suburb2 State2  Country1