HOUR | Account_id | media_id | impressions |
---|---|---|---|
2022-11-04 04:00:00 UTC | 256789 | 35 | null |
2022-11-04 05:00:00 UTC | 256789 | 35 | null |
2022-11-04 06:00:00 UTC | 256789 | 35 | null |
2022-11-04 07:00:00 UTC | 256789 | 35 | null |
2022-11-04 08:00:00 UTC | 256789 | 35 | 40 |
2022-11-04 09:00:00 UTC | 256789 | 35 | 7 |
2022-11-04 10:00:00 UTC | 256789 | 35 | null |
2022-11-04 11:00:00 UTC | 256789 | 35 | 10 |
2022-11-04 12:00:00 UTC | 256789 | 35 | 12 |
What we are trying to do is that when the impressions is count is null for an hour, then we take the value from the impressions where it is not null and then split the number evenly across the previously consecutive null rows and the first non null row.
If we take the row where the impressions count is 40
in the above 4 rows the impressions is null so including the row where the impressions is 40
makes the count as 5
, then we divide 40
by 5
hence each hour gets 8
impressions.
The same above logic can be applied to the row where the impressions count is 10
.
It is distributed between 2
rows evenly hence in the output it is 5
impressions for each hour.
Here HOUR
column is an increment of one hour with no gaps in between.
The query looks like this:
select *,
case when impressions is null then row_number() over(partition by media_id,ACCOUNT_ID ORDER BY HOUR) else 0 end as rn1,
from table_name order by 1 ;
How I take it from there?
Expected Output:
HOUR | Account_id | media_id | impressions | distributed_impressions |
---|---|---|---|---|
2022-11-04 04:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 05:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 06:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 07:00:00 UTC | 256789 | 35 | null | 8 |
2022-11-04 08:00:00 UTC | 256789 | 35 | 40 | 8 |
2022-11-04 09:00:00 UTC | 256789 | 35 | 7 | 7 |
2022-11-04 10:00:00 UTC | 256789 | 35 | null | 5 |
2022-11-04 11:00:00 UTC | 256789 | 35 | 10 | 5 |
2022-11-04 12:00:00 UTC | 256789 | 35 | 12 | 12 |
Consider below query.
SELECT * EXCEPT(part),
MAX(impressions) OVER w1 / COUNT(*) OVER W1 AS distributed_impressions
FROM (
SELECT *, COUNT(*) OVER w0 - COUNTIF(impressions IS NULL) OVER w0 AS part
FROM sample_table
WINDOW w0 AS (PARTITION BY Account_id, media_id ORDER BY HOUR ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) WINDOW w1 AS (PARTITION BY Account_id, media_id, part);
Query results
How each island is identified