sqlgoogle-bigqueryanalytics

Google Big Query: Forward Filling: IGNORE in Window Function


I am having difficulty using a window function to "forward fill" values in Google Big Query. It appears that the IGNORE part of the query is causing the problem. Assuming the dataset below, I am trying to get the "purchase dates" values for each social security number to fill in the nulls until another validate purchase date is encountered. Basically want to keep track of the latest purchase date in relation to each activity_date.

For example, looking at SS # 000-0000-000, I would like 2016-12-16 to forward fill all of the null values between activity dates 2016-12-17 to 2016-12-22 (purchase ages 1-6) and then forward fill 2016-12-23 for purchase ages 8-10. Same scenario for the other SS # 111-1111-111.

Activity_date   SS_Number   First_Purchase  First_Purchase_age  Purchase_dates
2016-12-26  000-0000-000    2016-12-16  10  null
2016-12-25  000-0000-000    2016-12-16  9   null
2016-12-24  000-0000-000    2016-12-16  8   null
2016-12-23  000-0000-000    2016-12-16  7   2016-12-23
2016-12-22  000-0000-000    2016-12-16  6   null
2016-12-21  000-0000-000    2016-12-16  5   null
2016-12-20  000-0000-000    2016-12-16  4   null
2016-12-19  000-0000-000    2016-12-16  3   null
2016-12-18  000-0000-000    2016-12-16  2   null
2016-12-17  000-0000-000    2016-12-16  1   null
2016-12-16  000-0000-000    2016-12-16  0   2016-12-16
2016-11-26  111-1111-111    2016-11-16  10  null
2016-11-25  111-1111-111    2016-11-16  9   null
2016-11-24  111-1111-111    2016-11-16  8   null
2016-11-23  111-1111-111    2016-11-16  7   2016-11-23
2016-11-22  111-1111-111    2016-11-16  6   null
2016-11-21  111-1111-111    2016-11-16  5   null
2016-11-20  111-1111-111    2016-11-16  4   null
2016-11-19  111-1111-111    2016-11-16  3   null
2016-11-18  111-1111-111    2016-11-16  2   null
2016-11-17  111-1111-111    2016-11-16  1   null
2016-11-16  111-1111-111    2016-11-16  0   2016-11-16

Here is the query I have:

SELECT
  activity_date,
  ss_number,
  first_purchase,
  first_purchase_age,
  purchase_dates,
  LAST_VALUE(purchase_dates) IGNORE NULLS OVER (PARTITION BY ss_number ORDER BY activity_date DESC ROWS BETWEEN UNBOUNDED PRECEEDING AND CURRENT ROW),
FROM
  [TABLE]

Solution

  • Below gives you what you described

    #standardSQL
    SELECT
      activity_date,
      ss_number,
      first_purchase,
      first_purchase_age,
      purchase_dates,
      MAX(purchase_dates) OVER (PARTITION BY ss_number ORDER BY activity_date 
                                ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
      ) AS filled_purchase_dates
    FROM   yourTable
    

    Above "workaround" works for your specific use case, but in other cases can require more coding, thus ability IGNORE NULLS still would be great!
    So, as of IGNORE in Window Function - see Support IGNORE NULLS/RESPECT NULLS for analytic and aggregate functions ticket. The expected syntax would be

    SELECT LAST_VALUE(x IGNORE NULLS) OVER (...) FROM yourTable