sqlgoogle-bigquery

Get the same not null value for all the records in a window


I want to get the same first/last not null value for all the records in a same window. I've prepared this example data:

WITH sample_data AS (
  SELECT 1 AS id, NULL AS value, CURRENT_TIMESTAMP() AS update_ts

  UNION ALL
  
  SELECT 1, "Good", TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)

  UNION ALL

  SELECT 1, "Bad", TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 DAY)
)

SELECT
  id,
  value,
  FIRST_VALUE(value IGNORE NULLS) OVER (ids) AS first_valid_value,
  LAST_VALUE(value IGNORE NULLS) OVER (ids) AS last_valid_value,
  update_ts
FROM sample_data
WINDOW ids AS (PARTITION BY id ORDER BY update_ts DESC)

This is the output for this query:

id value first_valid_value last_valid_value update_ts
1 2024-12-13 12:37:05.762489 UTC
1 Good Good Good 2024-12-12 12:37:05.762489 UTC
1 Bad Good Bad 2024-12-11 12:37:05.762489 UTC

There are two things which are confusing me:

  1. I added the INGORE NULLS part because I want to get a "valid" value, so I was expecting the first row (with NULL value field), to inherit the "Good" value in the first_valid_value field, and "Bad" for last_valid_value
  2. In row #2, I was surprised that field last_valid_value didn't inherited the "Bad" value, as the window function is defining

This is the output I want to achieve and which I was expecting:

id value first_valid_value last_valid_value update_ts
1 Good Bad 2024-12-13 12:37:05.762489 UTC
1 Good Good Bad 2024-12-12 12:37:05.762489 UTC
1 Bad Good Bad 2024-12-11 12:37:05.762489 UTC

Why is this happening, and how could I achieve the final result?


Solution

  • You need to specify the window frame clause:

    SELECT
      id,
      value,
      FIRST_VALUE(value IGNORE NULLS) OVER (ids) AS first_valid_value,
      LAST_VALUE(value IGNORE NULLS) OVER (ids) AS last_valid_value,
      update_ts
    FROM sample_data
    WINDOW ids AS (PARTITION BY id ORDER BY update_ts DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
    

    Output: query output

    From the docs:

    For aggregate analytic functions, if the ORDER BY clause is present but the window frame clause is not, the following window frame clause is used by default:

    RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW