sqloracle-databasesql-merge

Backfill data based or insert new records based on matching criteria


I have the following effective dated table with several thousands records that would need to be updated/inserted based on certain criteria.

Table Car_Specification:

Effective_Date Car_ID Num_Wheels Num_Doors Engine_Type Color
01-MAR-2021 C001 4 2 Turbo Red
01-APR-2021 C001 4 2 Turbo Blue
01-JUN-2021 C001 4 2 Turbo Green

Lookup Car_Change:

Effective_Date Car_ID Num_Wheels Num_Doors Engine_Type Color
01-MAR-2021 C001 4 2 Turbo White
01-MAY-2021 C001 4 2 Manual Red

Every day, changes to a car is captured inside the Car_Change table and the updates will need to be updated into Car_Specification table based on Car_ID and Effective_Date.

In the example above, I would like to query Car_Specification for the list of effective_date and car_id and perform a lookup into Car_Change table, if a match based on effective_date and car_id is found, I will update the data in the Car_Change with the data in the Car_Specification table for those fields that matches. The changes will also need to be forward filled or propagated forward.

If the effective_date and car_id doesn't match but there is a change in Car_Change, I will need to insert the new effective_date and car_id into Car_Specification and forward fill the changes.

For instance, C001 will show the following after the update:

Effective_Date Car_ID Num_Wheels Num_Doors Engine_Type Color
01-MAR-2021 C001 4 2 Turbo White
01-APR-2021 C001 4 2 Turbo White
01-MAY-2021 C001 4 2 Manual Red
01-JUN-2021 C001 4 2 Manual Red

I am thinking along the line of using the MERGE UPDATE/INSERT construct to handle the changes. Would like to seek advice on the most efficient way to handle such changes given the tables are large and I am looking to do the update/insert operation on a daily basis to keep my data inside Car_Specification updated. If I am going ahead with MERGE, how do I handle the forward filling of data?


Solution

  • As a frame challenge. Several thousands of records is a tiny data set; you do not need to merge the data. Just have a single row for each car in Car_Specification for the original specification and then if there are changes put them into Car_Changes and when you want to display the latest value then select the last change, or the specification if there were no changes.

    SELECT COALESCE(c.effective_date, s.effective_date) AS effective_date,
           s.car_id,
           COALESCE(c.num_wheels, s.num_wheels) AS num_wheels,
           COALESCE(c.num_doors, s.num_doors) AS num_doors,
           COALESCE(c.engine_type, s.engine_type) AS engine_type,
           COALESCE(c.color, s.color) AS color
    FROM   car_specification s
           LEFT OUTER JOIN LATERAL(
             SELECT *
             FROM   car_changes c
             WHERE  c.car_id = s.car_id
             AND    c.effective_date >= s.effective_date
             ORDER BY c.effective_date DESC
             FETCH FIRST ROW ONLY
           )
           ON (1 = 1) -- The join condition is inside the lateral join.
    

    If you want you can wrap it in a view.