First I would like to say that I am new to the stackoverflow community and relatively new to SQL itself and so please pardon me If I didn't format my question right or didn't state my requirements clearly.
I am trying to implement a type 2 SCD in Oracle. The structure of the source table (customer_records
) is given below.
CREATE TABLE customer_records(
day date,
snapshot_day number,
vendor_id number,
customer_id number,
rank number
);
INSERT INTO customer_records
(day,snapshot_day,vendor_id,customer_id,rank)
VALUES
(9/24/2014,6266,71047795,476095,3103),
(10/1/2014,6273,71047795,476095,3103),
(10/8/2014,6280,71047795,476095,3103),
(10/15/2014,6287,71047795,476095,3103),
(10/22/2014,6291,71047795,476095,3102),
(10/29/2014,6330,71047795,476095,3102),
(11/05/2015,6351,71047795,476095,3102),
(11/12/2015,6440,71047795,476095,3103);
The above table is updated weekly and I have pulled records for a particular customer represented by vendor_id
and customer_id
. Such that each customer will have a unique vendor_id
and customer_id
. I am trying to track the changes in the tier (rank
) of a customer. It may so happen that the customer's tier may remain same for several weeks and we are only willing to track when there is a change in the tier of the customer.
The desired output (dimension table) would look something like this:
SK Version Date_From Date_To Vendor_id Customer_Id Rank_Id
1 1 9/24/2014 10/22/2014 71047795 476095 3103
2 2 10/22/2014 11/05/2015 71047795 476095 3102
3 3 11/05/2015 12/31/2199 71047795 476095 3103
Such that whenever customer's tier hit a change we track that in a new table. Also, wanting to include the current_flag
= 'Y' for the most current tier.
I want to be able to do it using merge.
Here is an approach to group consecutive records having the same tier, while detecting changes.
The idea is to self-join the table, and to relate each record to the next record that has a different tier. This is done using a NOT EXISTS
condition with a correlated subquery.
LEFT JOIN
is needed, to avoid filtering out the last record (that owns the current tier), which does not have a next record yet : for this record, we use COALESCE()
to set up a default end date.
SELECT
c1.day day_from,
COALESCE(c2.day, TO_DATE('2199-12-31', 'yyyy-mm-dd')) day_to,
c1.Vendor_ID,
c1.Customer_ID,
c1.rank
FROM customer_records c1
LEFT JOIN customer_records c2
ON c2.Vendor_ID = c1.Vendor_ID
AND c2.Customer_ID = c1.Customer_ID
AND c2.rank <> c1.rank
AND c2.DAY > c1.DAY
AND NOT EXISTS (
SELECT 1
FROM customer_records c3
WHERE
c3.Vendor_ID = c1.Vendor_ID
AND c3.Customer_ID = c1.Customer_ID
AND c3.rank <> c1.rank
AND c3.DAY > c1.DAY
AND c3.DAY < c2.DAY
)
This returns :
DAY_FROM | DAY_TO | Vendor_ID | Customer_ID | rank
:-------- | :-------- | ------------------: | ----------: | -----------------:
24-SEP-14 | 22-OCT-14 | 71047795 | 476095 | 3103
01-OCT-14 | 22-OCT-14 | 71047795 | 476095 | 3103
08-OCT-14 | 22-OCT-14 | 71047795 | 476095 | 3103
15-OCT-14 | 22-OCT-14 | 71047795 | 476095 | 3103
22-OCT-14 | 12-NOV-15 | 71047795 | 476095 | 3102
29-OCT-14 | 12-NOV-15 | 71047795 | 476095 | 3102
05-NOV-15 | 12-NOV-15 | 71047795 | 476095 | 3102
12-NOV-15 | 31-DEC-99 | 71047795 | 476095 | 3103
Now we can group the record set by tier and end date to generate the expected results. ROW_NUMBER()
can give you the version number. It is also easy to check which record is the current one, as explained above.
SELECT
ROW_NUMBER() OVER(ORDER BY c2.day) version,
DECODE(c2.day, NULL, 'Y') current_flag,
MIN(c1.day) day_from,
COALESCE(c2.day, TO_DATE('2199-12-31', 'yyyy-mm-dd')) day_to,
c1.Vendor_ID,
c1.Customer_ID,
c1.rank
FROM customer_records c1
LEFT JOIN customer_records c2
ON c2.Vendor_ID = c1.Vendor_ID
AND c2.Customer_ID = c1.Customer_ID
AND c2.rank <> c1.rank
AND c2.DAY > c1.DAY
AND NOT EXISTS (
SELECT 1
FROM customer_records c3
WHERE
c3.Vendor_Id = c1.Vendor_Id
AND c3.Customer_ID = c1.Customer_ID
AND c3.rank <> c1.rank
AND c3.DAY > c1.DAY
AND c3.DAY < c2.DAY
)
GROUP BY
c1.Vendor_Id,
c1.Customer_ID,
c1.rank,
c2.day
ORDER BY
day_from
Results :
VERSION | CURRENT_FLAG | DAY_FROM | DAY_TO | Vendor_ID | Customer_ID | rank ------: | :----------- | :-------- | :-------- | ------------------: | ----------: | -----------------: 1 | N | 24-SEP-14 | 22-OCT-14 | 71047795 | 476095 | 3103 2 | N | 22-OCT-14 | 12-NOV-15 | 71047795 | 476095 | 3102 3 | Y | 12-NOV-15 | 31-DEC-99 | 71047795 | 476095 | 3103
In Oracle you can turn any select into a merge query using the MERGE syntax. You can match on all columns expected current_flag
and day_to
, and update these if a record already exists ; else, just insert a new one.
MERGE INTO dimensions dim
USING (
-- above query goes here --
) cust
ON dim.DAY_FROM = cust.DAY_FROM
AND dim.vendor_id = cust.vendor_id
AND dim.Customer_ID = cust.Customer_ID
AND dim.rank = cust.rank
WHEN MATCHED THEN UPDATE SET
dim.DAY_TO = cust.DAY_TO,
dim.CURRENT_FLAG = cust.CURRENT_FLAG
WHEN NOT MATCHED THEN
INSERT (
dim.DAY_FROM,
dim.VERSION,
dim.CURRENT_FLAG,
dim.DAY_FROM,
dim.DAY_TO,
dim.vendor_id,
dim.customer_id,
dim.rank
) VALUES (
cust.DAY_FROM,
cust.VERSION,
cust.CURRENT_FLAG,
cust.DAY_FROM,
cust.DAY_TO,
cust.vendor_id,
cust.Customer_ID,
cust.rank
)