I have a SQL table on Impala that contains ID
, dt
(monthly basis with no skipped month), and status
of each person ID. I want to check how long that each ID is in each status (my expected answer is shown on expected
column)
I tried to solve this problem on the value
column by using
count(status) over (partition by ID, status order by dt)
but it doesn't reset the value
when the status
is changed.
+------+------------+--------+-------+----------+
| ID | dt | status | value | expected |
+------+------------+--------+-------+----------+
| 0001 | 01/01/2020 | 0 | 1 | 1 |
| 0001 | 01/02/2020 | 0 | 2 | 2 |
| 0001 | 01/03/2020 | 1 | 1 | 1 |
| 0001 | 01/04/2020 | 1 | 2 | 2 |
| 0001 | 01/05/2020 | 1 | 3 | 3 |
| 0001 | 01/06/2020 | 0 | 3 | 1 |
| 0001 | 01/07/2020 | 1 | 4 | 1 |
| 0001 | 01/08/2020 | 1 | 5 | 2 |
+------+------------+--------+-------+----------+
Is there anyway to reset the counter when the status
is changed?
When you partition by ID
and status
, two groups are formed for the values 0
and 1
in status
field. So, the months 1, 2, 6
go into first group with 0
status and the months 3, 4, 5, 7, 8
go into the second group with 1
status. Then, the count function counts the number of statuses individually in those groups. Thus the first group has counts from 1
to 3
and the second group has counts from 1
to 5
. This query so far doesn't account for the change in statuses rather just simply divide the record set as per different status values.
One approach would be to divide the records into different blocks where each status change starts a new block. The below query follows this approach and gives the expected result:
SELECT ID,dt,status,
COUNT(status) OVER(PARTITION BY ID,block_number ORDER BY dt) as value
FROM (
SELECT ID,dt,status,
SUM(change_in_status) OVER(PARTITION BY ID ORDER BY dt) as block_number
FROM(
SELECT ID,dt,status,
CASE WHEN
status<>LAG(status) OVER(PARTITION BY ID ORDER BY dt)
OR LAG(status) OVER(PARTITION BY ID ORDER BY dt) IS NULL
THEN 1
ELSE 0
END as change_in_status
FROM statuses
) derive_status_changes
) derive_blocks;
Here is a working example in DB Fiddle.