mergedata-warehousefact-table

Data Warehouse Fact Tables - MERGE OR DROP & INSERT


I am building specific Fact Table for a Kimball-based EDW.

While loading fact table I came upon the realization on the different approaches.

Below the steps taken:

The Source Data

Sourcing Data

Lookup Surrogates

Load Data in Fact Table

Option One - Fresh Batch

DELETE FROM FactLearnerAbsenteeism WHERE SchoolYear = @CurrentYear

INSERT INTO FactLearnerAbsenteeism SELECT * FROM #Conform

Option Two - The Merge

MERGE dbo.FactLearnerAbsenteeism as DST

USING #Conform as SRC

*UPDATE CHANGES*

*INSERT NEW RECORDS*

My concern is: How I can i ensure records that no longer exist this year (deleted records) are removed. I prefer using the merge but using the DELETE option in MERGE can remove records from the previous years as they wont exist in the #Conform dataset.

Should I the rather source ALL data? Should I keep the previous years data in STAGE and only source the current years data?


Solution

  • I would propose that you add a technical column "IsValid" in your fact table, set by default to 0. Then your loading process could be as follows :

    Step 1 UPDATE FactLearnerAbsenteism SET IsValid = 0 WHERE SchoolYear = @CurrentYear

    Step 2 Perform the Merge, and set Isvalid to 1 for each updated or inserted record.

    Step 3 Decide what you want to do with the invalidated records : either you delete them in a last pass, or you keep them as it can represent an interesting piece of information enabling you to track the deletions in the source systems.

    Disadvantage is that analysts need to be aware of this isValid parameter in their queries, to avoid overestimating the absentees, but I tend to prefer this as it captures the maximum information.