I'm trying to add a column on this table and stuck for a little while
ID | Category 1 | Date | Data1 |
---|---|---|---|
A | 1 | 2022-05-30 | 21 |
B | 2 | 2022-05-21 | 15 |
A | 2 | 2022-05-02 | 33 |
A | 1 | 2022-02-11 | 3 |
B | 2 | 2022-05-01 | 19 |
A | 1 | 2022-05-15 | null |
A | 1 | 2022-05-20 | 11 |
A | 2 | 2022-04-20 | 22 |
to
ID | Category 1 | Date | Data1 | Picked_Data |
---|---|---|---|---|
A | 1 | 2022-05-30 | 21 | 11 |
B | 2 | 2022-05-21 | 15 | 19 |
A | 2 | 2022-05-02 | 33 | 22 |
A | 1 | 2022-02-11 | 3 | some number or null |
B | 2 | 2022-05-01 | 19 | some number or null |
A | 1 | 2022-05-15 | null | some number or null |
A | 1 | 2022-05-20 | 11 | some number or null |
A | 2 | 2022-04-20 | 22 | some number or null |
The logic is to partition by Category1 and ID then pick the latest none null value within the past 28 days. If there is no data exist, it'll be null
For the first row, ID = A and Category 1, it will pick 7th row as they are in the same category, ID and the date difference is <= 28. It skipped row 4th and 6th as the date is too far back and null value.
I've tried querying this by
select first_value(Data1) over (partition bty Category1 order by case when Data1 is not null and Date between Date - Inteverval 28 DAY and Date then 1 else 2) as Picked_Data
but it's picking incorrect rows,my guess is this query
Date between Date - Inteverval 28 DAY and Date
is not picking the correct date.. could anyone give me advise/suggestion how I could twick this query?
Consider below approach
select *,
first_value(data1 ignore nulls) over past_28_days as picked_data
from your_table
window past_28_days as (
partition by id, category_1
order by unix_date(date)
range between 29 preceding and 1 preceding
)
if applied to sample data in your question - output is