I was wondering if there's a way to re-write the following oracle SQL script to get a better processing time. This current version is very slow and wouldn't work the larger dataset that I'm working with.
I'm trying to assign a number to each customer based on how many years they have been with the business. And I need it to return the each customer with their start year and then count up to the present. I've tried to explain it below.
For a customer that has been with the business for 5 years I want the script to return the following
Customer code | Year Number | Number of years |
---|---|---|
CUST1 | 2018 | 0 |
CUST1 | 2019 | 1 |
CUST1 | 2020 | 2 |
CUST1 | 2021 | 3 |
CUST1 | 2022 | 4 |
CUST1 | 2023 | 5 |
I'm trying to use CONNECT BY and LEVEL for this but it's taking too long.
YEARS AS
(
SELECT CUSTOMER_CODE, (MIN_YR+LEVEL-1) AS YR_NUM, ((MIN_YR+LEVEL-1) - MIN_YR) AS YRS
FROM
(
SELECT CUSTOMER_CODE, EXTRACT (YEAR FROM MIN_DT) MIN_YR , '2023' AS CUR_YR
FROM
(
SELECT REV.CUSTOMER_CODE, MIN(REV.ORDER_DATE) AS MIN_DT
FROM REVENUE_TABLE REV
JOIN CUSTOMER_DETAILS ACC ON REV.CUSTOMER_CODE = ACC.CUSTOMER_CODE
GROUP BY REV.CUSTOMER_CODE
)
)
CONNECT BY LEVEL <= CUR_YR-MIN_YR
)
SELECT * FROM YEARS
I've used CONNECT BY and LEVEL clauses to generate a number for each customer which would represent the number of years with the business. The script is working but it would take too long for it to be useable. I let the script run for around 4 hours but it didn't finish running. The data volume is too high for this to complete.
Starting with the innermost query, there is probably no use including the customer_details table - you are not using anything in it. Its only possible use is to filter out revenue elements that are not in your customer_details table, but I really doubt you are trying to do that.
A more succinct approach would be to use a recursive CTE that starts with the earliest date for each customer, then rolls it forward by a year each time until it equals the current year.
With Recursive Years AS (
SELECT REV.CUSTOMER_CODE, Year(MIN(REV.ORDER_DATE)) AS Year_Number
FROM REVENUE_TABLE REV
GROUP BY REV.CUSTOMER_CODE
UNION ALL
Select CUSTOMER_CODE, Year_Number+1
From Years
Where Year_Number < Year(Current_Date)
)
Select * from Years
Now you can join the rest of your tables to that result or do whatever else you need with it.