data-warehousedimensional-modelingscdtype-2-dimension

Type II SCD with entities that merge over time


Let's say we have a dimension that represents sales offices. The offices might move, which would be a type II change. We'd want to track operations that happened at the old office location, and operations that now happen at the new, and know when the change happened. So far, just standard type II design. Now let's say an office merges with another office. That is, the operational activity of two originally distinct offices (the "parent offices") is now taking place in a single office (the "merged office"), which might be a continuation (physically or in terms of staff) of either one of the original offices, or it might be a new office altogether that is, from a business point of view, a continuation of the previous two.

The reporting/analysis requirements are as follows:

I'm not sure how to model this with any of the SCD types. If we simply replace the two parent office entries with a single new one, and update all the fact tables accordingly, we have a type I change. That allows us to see current activity just fine, but we lose history. If we keep the records separate, we won't know about the merger. If we add a third record to represent the merged office, we also lose history (which natural key would it have? neither of the parent offices' natural keys would be suitable).

Do I need to use a bridge/many-to-many table? That introduces complexity that I'd like to avoid. However, if that's the best way to do it, then so be it. I'm still not sure, however, how that would be structured. Perhaps the fact table would point to an office entry, and the offices would be grouped in a many-to-many fashion. Reporting would be done based on groups, rather than directly on the office dimension.

Answers to ElectricLlama's Questions


Solution

  • I would prefer the simplest possible solution that the customer could accept, so I would do the following. I would provide two office field in the office dimension:

    1. Office_as_today
    2. Office_original

    (of course you have to pick names that are good for your customer) At the start the two fields would be set equal. When two office merges I would go back in the two original offices and update the Office_as_today field, with the name of the merged office.

    The new facts (from the merge on) will be registered with a new row with the two fields again equal.

    The solution is very simple and fulfill almost all the requirement, with the exception of being unable to follow the original offices after the merge (here I underline your "at least").