sqljoinimpalaranklead

Impala compare consecutive rows and insert identical row if there are no values


I have a table that gives me data every month and I need that range of time. I noticed that sometimes I don't have data for 3/4 month but I need to duplicate the last row available with the missing timestamp.

Example:

product_id total_revenue yearmonth
1 50 202201
2 17 202201
3 30 202201
1 67 202202
2 31 202202
1 67 202203
2 31 202203
3 33 202203

But I need an output like:

product_id total_revenue yearmonth
1 50 202201
2 17 202201
3 30 202201
1 67 202202
2 31 202202
3 30 202202
1 67 202203
2 31 202203
3 33 202203

I have a select statement like:

select 
    product_id, total_revenue, yearmonth 
from 
    revenue

I found a similar question, (Postgresql compare consecutive rows and insert identical row if there are no values) but in Impala I have not the lateral join, does anybody know how can I do?


Solution

  • I did it!

    with crossed as
    (
    select
    product_id,id_month,
    rank() over (partition by product_id order by id_month asc) as r
    from
    (
    select distinct cast(id_month as string) as id_month
    from calendar d
    where day_data <= date_sub(now(), interval 1 month)
    ) a
    cross join
    (select product_id, min(concat(year,month)) as minimum
    from revenue
    group by product_id
    ) b
    where a.id_month >= b.minimum
    )
    , created as
    (
    select
    coalesce(a.product_id,b.product_id) as product_id,
    coalesce(concat(a.year,a.month),b.id_month) as id_month,
    a.total_revenue,
    b.r
    from revenue a
    full outer join crossed b
    on a.product_id=b.product_id and concat(a.year,a.month)=b.id_month
    where a.year is null
    )
    ,
    real as
    (
    select
    coalesce(a.product_id,b.product_id) as product_id,
    coalesce(concat(a.year,a.month),b.id_month) as id_month,
    a.total_revenue,
    b.r
    from revenue a
    full outer join crossed b
    on a.product_id=b.product_id and concat(a.year,a.month)=b.id_month
    where a.year is not null
    )
    select product_id,id_month,total_revenue,'CREATED' as tipe
    from
    (
    select created.product_id,created.id_month,real.total_revenue,
    rank () over (partition by created.product_id,created.id_month order by (created.r-real.r) asc) as r
    from
    created left join real on created.product_id=real.product_id
    and created.id_month > real.id_month
    )a
    where r=1
    union
    select product_id,concat(year,month) as id_month,total_revenue,'REAL' as tipe
    from revenue