databasedata-warehousedimensional-modelingstar-schemafact-table

Fact table updatable/deletable rows


AFAIK, the best practices say that you should never updated fact table rows, at least for transaction and periodic snapshot grains. While reading about Fact Table Surrogate Key, found a notion of updates:

Certain ETL techniques for updating fact rows are only feasible if a surrogate key is assigned to the fact rows. Specifically, one technique for loading updates to fact rows is to insert the rows to be updated as new rows, then to delete the original rows as a second step as a single transaction. The advantages of this technique from an ETL perspective are improved load performance, improved recovery capability and improved audit capabilities. The surrogate key for the fact table rows is required as multiple identical primary keys will often exist for the old and new versions of the updated fact rows between the time of the insert of the updated row and the delete of the old row.

Did Bob Becker mean updates/deletes from fact table? Is it a common practice?


Solution

  • Time to time you may need to update a fact table, simple from the reason that wrong data were loaded.

    Not sure how a surrogate key can help here - you must find the rows to be changed based on the natural keys.

    But yes, an INSERT and DELETE (possible only logical delete setting a cancellation flag) could be prefered to a simple UPDATE basically from auditibility and recovery reasons. Again I'm not sure how this can affect performance.

    Above all, a best practice of 2006 need not be a current best practice - nowadays a non-trivial fact tables typically do not have a primary key as a unique index makes the rolling partition window concept harder. (If required, the uniqueness is checked in ETL process).