databasedata-modelingdata-warehousedimensional-modelingfact-table

In Data Warehousing, Should a Data Model have less+large Dimension Tables or split into more+small Dimension Tables?


I've just started learning Data Modeling in DW. And now I'm a bit confused about choosing with field to build a Dimension Table and whether should I split into many DIM tables For example, I have a Employee Table and it's columns as below:

    EmployeeID
    FirstName
    LastName
    State
    Age
    Department
    Position
    Skin Color
    Hair Color
    Sale Amount (Fact)

Emp_DIM table content every column above except Sale Amount

and a Fact table that contains

EmployeeID (Surrogate Key)
Sale Amount (Fact)

Ok, put the Date DIM aside. I want to ask about the Emp_DIM table

  1. Should I add Department_DIM(ID, Name, HeadPerson), State_DIM (ID, Name, StateCode), Position_DIM (ID, Name, Description), and Color_DIM (ID, Name) to represent Department, State, Position, Color? Then the Fact_Table will also have DepartmentID, StateID,... in it. Because sometimes I saw Department_DIM, Position_DIM and State_DIM table on the internet
  2. How can I know which attribute should be a separate DIM_table? Because in the above example, I can still query everything like Total Sale Amount per Department, per State, per Position,... with SQL and join with 1 DIM table only.

According to what I read on the internet, the DIM table should have as many fields as possible, and contain descriptive information. But I'm still confused and not sure.


Solution

  • Should I add Department_DIM

    If a Department has other attributes you need to report on, then you should consider splitting it out. This "normalization of dimensions" is also called "snowflaking", or moving from a star schema to a snowflake schema. But it's totally optional and people do it both ways.

    Then the Fact_Table will also have DepartmentID

    This a separate decision. This is denormalizing by putting non-key dimension attributes directly on your fact table, and it's a performance optimization. You might do this for a few key attributes.