sqloracle-databaseetlbulk-operations

Implementing Type 2 SCD in Oracle


First I would like to say that I am new to the stackoverflow community and relatively new to SQL itself and so please pardon me If I didn't format my question right or didn't state my requirements clearly.

I am trying to implement a type 2 SCD in Oracle. The structure of the source table (customer_records) is given below.

CREATE TABLE customer_records(
    day date,
    snapshot_day number,
    vendor_id number,
    customer_id number,
    rank number
);

INSERT INTO customer_records 
(day,snapshot_day,vendor_id,customer_id,rank)
VALUES
(9/24/2014,6266,71047795,476095,3103),
(10/1/2014,6273,71047795,476095,3103),
(10/8/2014,6280,71047795,476095,3103),
(10/15/2014,6287,71047795,476095,3103),
(10/22/2014,6291,71047795,476095,3102),
(10/29/2014,6330,71047795,476095,3102),
(11/05/2015,6351,71047795,476095,3102),
(11/12/2015,6440,71047795,476095,3103);

The above table is updated weekly and I have pulled records for a particular customer represented by vendor_id and customer_id. Such that each customer will have a unique vendor_id and customer_id. I am trying to track the changes in the tier (rank) of a customer. It may so happen that the customer's tier may remain same for several weeks and we are only willing to track when there is a change in the tier of the customer.

The desired output (dimension table) would look something like this:

SK  Version   Date_From    Date_To    Vendor_id   Customer_Id  Rank_Id

1     1       9/24/2014    10/22/2014    71047795            476095       3103
2     2       10/22/2014   11/05/2015    71047795            476095       3102
3     3       11/05/2015   12/31/2199    71047795            476095       3103

Such that whenever customer's tier hit a change we track that in a new table. Also, wanting to include the current_flag = 'Y' for the most current tier.

I want to be able to do it using merge.


Solution

  • Here is an approach to group consecutive records having the same tier, while detecting changes.

    The idea is to self-join the table, and to relate each record to the next record that has a different tier. This is done using a NOT EXISTS condition with a correlated subquery.

    LEFT JOIN is needed, to avoid filtering out the last record (that owns the current tier), which does not have a next record yet : for this record, we use COALESCE() to set up a default end date.

    SELECT 
        c1.day day_from,
        COALESCE(c2.day, TO_DATE('2199-12-31', 'yyyy-mm-dd')) day_to,
        c1.Vendor_ID,
        c1.Customer_ID, 
        c1.rank
    FROM customer_records c1
    LEFT JOIN customer_records c2 
        ON  c2.Vendor_ID = c1.Vendor_ID
        AND c2.Customer_ID         = c1.Customer_ID
        AND c2.rank <> c1.rank
        AND c2.DAY                 > c1.DAY
        AND NOT EXISTS (
            SELECT 1
            FROM customer_records c3
            WHERE
                    c3.Vendor_ID = c1.Vendor_ID
                AND c3.Customer_ID         = c1.Customer_ID
                AND c3.rank <> c1.rank
                AND c3.DAY                 > c1.DAY
                AND c3.DAY                 < c2.DAY
        )
    

    This returns :

     DAY_FROM  | DAY_TO    | Vendor_ID | Customer_ID | rank
     :-------- | :-------- | ------------------: | ----------: | -----------------:
     24-SEP-14 | 22-OCT-14 |            71047795 |      476095 |               3103
     01-OCT-14 | 22-OCT-14 |            71047795 |      476095 |               3103
     08-OCT-14 | 22-OCT-14 |            71047795 |      476095 |               3103
     15-OCT-14 | 22-OCT-14 |            71047795 |      476095 |               3103
     22-OCT-14 | 12-NOV-15 |            71047795 |      476095 |               3102
     29-OCT-14 | 12-NOV-15 |            71047795 |      476095 |               3102
     05-NOV-15 | 12-NOV-15 |            71047795 |      476095 |               3102
     12-NOV-15 | 31-DEC-99 |            71047795 |      476095 |               3103
    

    Now we can group the record set by tier and end date to generate the expected results. ROW_NUMBER() can give you the version number. It is also easy to check which record is the current one, as explained above.

    SELECT 
        ROW_NUMBER() OVER(ORDER BY c2.day) version,
        DECODE(c2.day, NULL, 'Y') current_flag,
        MIN(c1.day) day_from,
        COALESCE(c2.day, TO_DATE('2199-12-31', 'yyyy-mm-dd')) day_to,
        c1.Vendor_ID,
        c1.Customer_ID, 
        c1.rank
    FROM customer_records c1
    LEFT JOIN customer_records c2 
        ON  c2.Vendor_ID = c1.Vendor_ID
        AND c2.Customer_ID         = c1.Customer_ID
        AND c2.rank <> c1.rank
        AND c2.DAY                 > c1.DAY
        AND NOT EXISTS (
            SELECT 1
            FROM customer_records c3
            WHERE
                    c3.Vendor_Id = c1.Vendor_Id
                AND c3.Customer_ID         = c1.Customer_ID
                AND c3.rank <> c1.rank
                AND c3.DAY                 > c1.DAY
                AND c3.DAY                 < c2.DAY
        )
    GROUP BY
        c1.Vendor_Id, 
        c1.Customer_ID, 
        c1.rank, 
        c2.day
    ORDER BY
        day_from
    

    Results :

    VERSION | CURRENT_FLAG | DAY_FROM  | DAY_TO    | Vendor_ID | Customer_ID | rank
    ------: | :----------- | :-------- | :-------- | ------------------: | ----------: | -----------------:
          1 | N            | 24-SEP-14 | 22-OCT-14 |            71047795 |      476095 |               3103
          2 | N            | 22-OCT-14 | 12-NOV-15 |            71047795 |      476095 |               3102
          3 | Y            | 12-NOV-15 | 31-DEC-99 |            71047795 |      476095 |               3103
    

    In Oracle you can turn any select into a merge query using the MERGE syntax. You can match on all columns expected current_flag and day_to, and update these if a record already exists ; else, just insert a new one.

    MERGE INTO dimensions dim
    USING (
       -- above query goes here --
    ) cust 
        ON  dim.DAY_FROM            = cust.DAY_FROM
        AND dim.vendor_id = cust.vendor_id
        AND dim.Customer_ID         = cust.Customer_ID
        AND dim.rank  = cust.rank
    WHEN MATCHED THEN UPDATE SET 
        dim.DAY_TO = cust.DAY_TO,
        dim.CURRENT_FLAG = cust.CURRENT_FLAG
    WHEN NOT MATCHED THEN 
        INSERT (
            dim.DAY_FROM, 
            dim.VERSION, 
            dim.CURRENT_FLAG, 
            dim.DAY_FROM, 
            dim.DAY_TO, 
            dim.vendor_id, 
            dim.customer_id, 
            dim.rank
        ) VALUES (
            cust.DAY_FROM, 
            cust.VERSION, 
            cust.CURRENT_FLAG, 
            cust.DAY_FROM, 
            cust.DAY_TO, 
            cust.vendor_id, 
            cust.Customer_ID, 
            cust.rank
        )