databasefactwarehouse

Design of fact table


My question is about modeling of fact_table in data warehouse. for example , we have users that subscribe to different subjects and we want to track when they started subscription. each user belongs to specific department. and users can change their department. there can be two designs for fact table:

+----------+------------------+-----------------+---------------+------------+
| user_key | subject_key      |  department_key |   start_Date  |  end_date  |
+----------+------------------+-----------------+---------------+------------+
|        1 |               10 |             123 |    2017-09-10 | 2017-09-25 |
|        2 |               11 |              90 |    2017-09-20 | 9999-12-29 |
+----------+------------------+-----------------+---------------+------------+

it means that user subscribed to subject 10 on 2017-09-10 and unsubscribe on 2017-09-25

the other design is to remove department_key from design.

+----------+------------------+---------------+------------+
| user_key |   department_key |   start_Date  |  end_date  |
+----------+------------------+---------------+------------+
|        1 |              123 |    2017-09-10 | 2017-09-25 |
|        2 |               90 |    2017-09-20 | 9999-12-29 |
+----------+------------------+---------------+------------+

and the aggregation table is something like this:

+---------+-----------+---------------+------------------+
| user_id | user_name |  subject_name | department_anem  |
+---------+-----------+---------------+------------------+
|       1 |    john   | politics      |  sales           |
|       2 |    Mark   | sport         |   marketing      |
+---------+-----------+---------------+------------------+

The problem is that, the department can change for the user. and we want the current department of the user in aggregation, the question is should I include the department_key in fact table and update it everytime the user changes its department or the logic has to be handled in aggregation? Is the fact table without other dimension keys except subject key is a "really" a fact table?

Thanks


Solution

  • Referring to the first example you provided.

    This looks a lot like a "factless fact table": https://www.1keydata.com/datawarehousing/factless-fact-table.html

    Alternatively: With the subject_key removed it appears to be an SCD type 2 dimension table because start_date and end_date are recorded and it would not contain measures (see the Wiki entry for type 2 slowly changing dimensions, below):

    https://en.wikipedia.org/wiki/Slowly_changing_dimension

    We could name your table dim_user_dept_history (the intersection of dim_user and dim_dept, dim_date). Columns: user_key, dept_key, start_date, end_date, current_flag

    And for tracking the measures, a fact table:

    fact_table Columns: user_key, subject_key, current_dept_key, click_timestamp, date_dim_key

    Perhaps some other measures that might go with subject_key, like page_key (if they clicked on a help page for that topic in your local wiki, for example).

    "update it everytime the user changes its department or the logic has to be handled in aggregation?" Updating fact tables is considered bad practice in data warehouses. Update the dimensions instead, and in most cases do that with SCD type 2 so history is preserved. An SCD Type 2 dim allows other questions to be answered, like, "How often to people change departments?" You could answer that question with the fact table, but the dim has far fewer rows to scan.