Write Hive SQL query to get the output from the input shown below.
I am trying to add rank column which is ordered by event_ts. But if there is same item_id for any consecutive rows, they should get the same rank. Partition by item_id will not work here as it will rank each partition window starting from 1. But I want the rank to monotonically increase.
NOTE: dense_rank() will not solve this
Input:
event_ts | item_id |
---|---|
T1 | AA |
T2 | XX |
T3 | CC |
T4 | CC |
T5 | CC |
T6 | AA |
Output:
event_ts | item_id | rank |
---|---|---|
T1 | AA | 1 |
T2 | XX | 2 |
T3 | CC | 3 |
T4 | CC | 3 |
T5 | CC | 3 |
T6 | AA | 4 |
This should give you what you're looking for. It will keep the same rank if the row above has the same value, then will continue incrementally ranking.
Select
event_ts,
item_id,
sum(inc) over(order by event_ts) as rank_order
from (
select *,
case when lag(item_id) over (order by event_ts) = item_id
then 0 else 1 end as inc
from table1
) table2
table1
event_ts | item_id |
---|---|
T1 | AA |
T2 | XX |
T3 | CC |
T4 | CC |
T5 | CC |
T6 | AA |
Results
event_ts | item_id | rank_order |
---|---|---|
T1 | AA | 1 |
T2 | XX | 2 |
T3 | CC | 3 |
T4 | CC | 3 |
T5 | CC | 3 |
T6 | AA | 4 |