sql-serverdata-warehousestar-schemaschema-design

Data warehouse design - Multiple lookup values


Looking for inputs on a data warehouse schema design. Here is the scenario:

I have an Action Table and User Table that is currently joined based on UserId to get the details about the user who did an action.

Action Table:
    UserId   Action
    123      Test001

User Table:
    UserId    UserName
    123       Adam

Now, we have to migrate the users to a new user management system (UMS) and here is how it works:

  1. Existing users will be migrated to UMS and new UserId (let's call this as a Modern UserId and existing UserId as Legacy UserId) will be assigned. So, the new records coming-in for new actions will carry the new UserId.
  2. New users created in UMS will only have Modern UserId and Legacy UserId will be run.
  3. Migrated users will have both Legacy UserId and Modern UserId.

Now, when we do reporting, we have to expose both historic and new action data. Wondering what should be the ideal schema design so we can report both historic & new actions and map them to the right user.

Platform: SQL Server 2016, Analysis Services

Please let me know if you need more details.


Solution

  • Agreed with the previous answer. When your upstream team performs the migration to UMS, they should preserve in some way a mapping between Legacy User IDs and Modern User IDs. Downstream in the warehouse, I suggest that you keep both IDs in your User dimension table but generate a surrogate key in this table which will serve as a primary key (it can be just an incremental integer). This way you can use the surrogate key as a foreign key in your Action fact table whether the User is a modern or legacy one.

    Here is my proposal of data model design for your tables:

    DIM_USER
    - USER_KEY (pk)
    - USER_ID
    - USER_ID_LEGACY
    - USERNAME
    - ....
    
    DIM_ACTION
    - ACTION_KEY (pk)
    - ACTION
    - ....
    
    FACT_ACTION
    - USER_KEY
    - ACTION_KEY
    - ....