sqlgoogle-bigqueryimputation

SQL, backward fill only for first nulls, forward fill only for last nulls, linear interpolation for other nulls


How do I fill backward, forward, and linear interpolation of a column in a table that has column timestamp in bigquery?

I have this table:

timestamp mycol
1 null
2 null
3 69
4 null
5 71
6 72
7 null

I expect outcome like this:

timestamp mycol
1 69
2 69
3 69
4 70
5 71
6 72
7 72

Solution

  • see below query, the result is inspired from https://justrocketscience.com/post/interpolation_sql/ however, my query handles the nulls if last row is a null

     WITH input_data as (
      SELECT 1 as _timestamp,   null as my_col
      UNION ALL
      SELECT 2 as _timestamp,   null as my_col
      UNION ALL
      SELECT 3 as _timestamp,   69 as my_col
      UNION ALL
      SELECT 4 as _timestamp,   null as my_col
      UNION ALL
      SELECT 5 as _timestamp,   71 as my_col
      UNION ALL
      SELECT 6 as _timestamp,   72 as my_col
      UNION ALL
      SELECT 7 as _timestamp,   null as my_col
    ) SELECT *,
        cast(case when input_data.my_col is not null then input_data.my_col
            when lag(input_data.my_col) over (ORDER BY _timestamp, my_col) is not null and 
                lead(input_data.my_col) over (ORDER BY _timestamp, my_col) is not null 
                then ( lag(input_data.my_col) over (ORDER BY _timestamp, my_col) + 
                    lead(input_data.my_col) over (ORDER BY _timestamp, my_col) ) / 2
            else IFNULL(FIRST_VALUE(my_col IGNORE NULLS)  OVER (item_window ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING ),
                        LAG(my_col) OVER (ORDER BY _timestamp, my_col))
            end 
            as int64)
            as new_value 
    FROM input_data
    WINDOW item_window as (ORDER BY input_data._timestamp)
    

    See query output in the screenshot

    enter image description here