Small change in the requirement
Needed Teradata SQL query to create a grouping from the first transaction to its subsequent transactions. If a subsequent transaction is made within 30 days of the previous transaction, consider them all to belongs to the same group. Break the group and start a new one if any transaction is made more than 30 days after the previous transaction.
data:
trans_Id | customer_id | trans_date |
---|---|---|
001 | 1101 | 2020-11-02 |
002 | 1101 | 2020-11-14 |
003 | 1101 | 2020-11-18 |
004 | 1101 | 2021-12-04 |
005 | 1101 | 2021-01-06 |
006 | 1101 | 2021-01-08 |
007 | 1101 | 2021-02-17 |
008 | 1101 | 2021-03-01 |
009 | 1101 | 2021-03-04 |
010 | 1102 | 2021-03-02 |
011 | 1102 | 2021-03-08 |
012 | 1102 | 2021-04-01 |
013 | 1102 | 2021-04-02 |
014 | 1102 | 2021-04-12 |
015 | 1102 | 2021-04-29 |
016 | 1102 | 2021-06-10 |
017 | 1102 | 2021-06-12 |
Expected result (expecting grouping like below). from 001 till 004 fall under within 30 day to its previous transaction. 005 has elapsed 30 day from it's previous transaction so new group has to be started.
trans_Id | customer_id | trans_date | Group |
---|---|---|---|
001 | 1101 | 2020-11-02 | 1 |
002 | 1101 | 2020-11-14 | 1 |
003 | 1101 | 2020-11-18 | 1 |
004 | 1101 | 2021-12-04 | 1 |
005 | 1101 | 2021-01-06 | 2 |
006 | 1101 | 2021-01-08 | 2 |
007 | 1101 | 2021-02-17 | 3 |
008 | 1101 | 2021-03-01 | 3 |
009 | 1101 | 2021-03-04 | 3 |
010 | 1102 | 2021-03-02 | 1 |
011 | 1102 | 2021-03-08 | 1 |
012 | 1102 | 2021-04-01 | 2 |
013 | 1102 | 2021-04-02 | 2 |
014 | 1102 | 2021-04-12 | 2 |
015 | 1102 | 2021-04-29 | 2 |
016 | 1102 | 2021-06-10 | 3 |
017 | 1102 | 2021-06-12 | 3 |
From Oracle 12, you can use MATCH_RECOGNIZE
to perform row-by-row pattern matching:
SELECT *
FROM table_name
MATCH_RECOGNIZE(
PARTITION BY customer_id
ORDER BY trans_date
MEASURES
MATCH_NUMBER() AS grp
ALL ROWS PER MATCH
PATTERN ( within_30_days+ )
DEFINE
within_30_days AS trans_date <= FIRST(trans_date) + INTERVAL '30' DAY
)
Which, for the sample data:
CREATE TABLE table_name (trans_id, customer_id, trans_date) AS
SELECT '001', '1101', DATE '2020-11-02' FROM DUAL UNION ALL
SELECT '002', '1101', DATE '2020-11-14' FROM DUAL UNION ALL
SELECT '003', '1101', DATE '2020-11-18' FROM DUAL UNION ALL
SELECT '004', '1101', DATE '2021-12-04' FROM DUAL UNION ALL
SELECT '005', '1101', DATE '2021-12-05' FROM DUAL UNION ALL
SELECT '006', '1101', DATE '2021-12-08' FROM DUAL UNION ALL
SELECT '007', '1101', DATE '2021-01-17' FROM DUAL UNION ALL
SELECT '008', '1101', DATE '2021-05-01' FROM DUAL UNION ALL
SELECT '009', '1101', DATE '2021-05-04' FROM DUAL UNION ALL
SELECT '010', '1102', DATE '2021-03-02' FROM DUAL UNION ALL
SELECT '011', '1102', DATE '2021-03-08' FROM DUAL UNION ALL
SELECT '012', '1102', DATE '2021-04-01' FROM DUAL UNION ALL
SELECT '013', '1102', DATE '2021-04-02' FROM DUAL UNION ALL
SELECT '014', '1102', DATE '2021-04-12' FROM DUAL UNION ALL
SELECT '015', '1102', DATE '2021-04-29' FROM DUAL UNION ALL
SELECT '016', '1102', DATE '2021-06-10' FROM DUAL UNION ALL
SELECT '017', '1102', DATE '2021-06-12' FROM DUAL;
Outputs:
CUSTOMER_ID | TRANS_DATE | GRP | TRANS_ID |
---|---|---|---|
1101 | 2020-11-02 00:00:00 | 1 | 001 |
1101 | 2020-11-14 00:00:00 | 1 | 002 |
1101 | 2020-11-18 00:00:00 | 1 | 003 |
1101 | 2021-01-17 00:00:00 | 2 | 007 |
1101 | 2021-05-01 00:00:00 | 3 | 008 |
1101 | 2021-05-04 00:00:00 | 3 | 009 |
1101 | 2021-12-04 00:00:00 | 4 | 004 |
1101 | 2021-12-05 00:00:00 | 4 | 005 |
1101 | 2021-12-08 00:00:00 | 4 | 006 |
1102 | 2021-03-02 00:00:00 | 1 | 010 |
1102 | 2021-03-08 00:00:00 | 1 | 011 |
1102 | 2021-04-01 00:00:00 | 1 | 012 |
1102 | 2021-04-02 00:00:00 | 2 | 013 |
1102 | 2021-04-12 00:00:00 | 2 | 014 |
1102 | 2021-04-29 00:00:00 | 2 | 015 |
1102 | 2021-06-10 00:00:00 | 3 | 016 |
1102 | 2021-06-12 00:00:00 | 3 | 017 |