id | flow | amount | date |
---|---|---|---|
1 | IN | 100 | 2023-08-01 |
2 | OUT | 100 | 2023-08-02 |
3 | IN | 100 | 2023-08-03 |
4 | OUT | 100 | 2023-08-04 |
5 | OUT | 100 | 2023-08-05 |
In a table like above, I want to sum all OUT transactions which take place after every IN transaction. The desired outcome should look like below:
id | flow | amount | date | out_amount |
---|---|---|---|---|
1 | IN | 100 | 2023-08-01 | 100 |
3 | IN | 100 | 2023-08-03 | 200 |
How could this be achieved?
If you are using Presto then you can follow gaps-and-islands approach (based on provided data I would use id
as ordering field, but for actual you might need/want to use date
):
-- sample data
with dataset(id, flow, amount, date) as (
values (1, 'IN', 100, '2023-08-01'),
(2, 'OUT', 100, '2023-08-02'),
(3, 'IN', 100, '2023-08-03'),
(4, 'OUT', 100, '2023-08-04'),
(5, 'OUT', 100, '2023-08-05'),
(6, 'IN', 100, '2023-08-11')
)
-- query
select min(id) id,
min_by(flow, id) flow,
min_by(amount, id) amount,
min_by(date, id) date,
sum(if(flow = 'OUT', amount)) out
from (select *,
count_if(flow = 'IN') over (order by id) as grp
from dataset)
group by grp
order by id;
Output:
id | flow | amount | date | out |
---|---|---|---|---|
1 | IN | 100 | 2023-08-01 | 100 |
3 | IN | 100 | 2023-08-03 | 200 |
6 | IN | 100 | 2023-08-11 | NULL |
For Trino you can also use MATCH_RECOGNIZE
:
-- query
SELECT *
FROM dataset MATCH_RECOGNIZE(
ORDER BY id
MEASURES
A.id as id,
A.flow as flow,
A.amount as amount,
A.date as date,
SUM(B.amount) AS out
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A B*)
DEFINE
A AS flow = 'IN',
B AS flow = 'OUT'
);