sqlgoogle-bigquerybigquery-udf

BigQuery - Picking latest not null value within 28 interval


I'm trying to add a column on this table and stuck for a little while

ID Category 1 Date Data1
A 1 2022-05-30 21
B 2 2022-05-21 15
A 2 2022-05-02 33
A 1 2022-02-11 3
B 2 2022-05-01 19
A 1 2022-05-15 null
A 1 2022-05-20 11
A 2 2022-04-20 22

to

ID Category 1 Date Data1 Picked_Data
A 1 2022-05-30 21 11
B 2 2022-05-21 15 19
A 2 2022-05-02 33 22
A 1 2022-02-11 3 some number or null
B 2 2022-05-01 19 some number or null
A 1 2022-05-15 null some number or null
A 1 2022-05-20 11 some number or null
A 2 2022-04-20 22 some number or null

The logic is to partition by Category1 and ID then pick the latest none null value within the past 28 days. If there is no data exist, it'll be null

For the first row, ID = A and Category 1, it will pick 7th row as they are in the same category, ID and the date difference is <= 28. It skipped row 4th and 6th as the date is too far back and null value.

I've tried querying this by

select first_value(Data1) over (partition bty Category1 order by case when Data1 is not null and Date between Date - Inteverval 28 DAY and Date then 1 else 2) as Picked_Data

but it's picking incorrect rows,my guess is this query

Date between Date - Inteverval 28 DAY and Date

is not picking the correct date.. could anyone give me advise/suggestion how I could twick this query?


Solution

  • Consider below approach

    select *, 
      first_value(data1 ignore nulls) over past_28_days as picked_data
    from your_table
    window past_28_days as (
      partition by id, category_1 
      order by unix_date(date) 
      range between 29 preceding and 1 preceding
    )                
    

    if applied to sample data in your question - output is

    enter image description here