data-warehousedimensional-modelingstar-schemastar-schema-datawarehouse

Type 2 Slowly Changing Dimensions and Querying for Historical data at a point in time


I have a small star schema representing time entries for issues in Jira. I have an IssueAttributes type 2 dimension table, and then a TimeEntry fact table.

Simplified view of dimension table:

+----------+---------+-------------+-----------+-----------+------------+----------+
| IssueKey | IssueID | IssueNumber | IssueName | IsCurrent | ValidFrom  | ValidTo  |
+----------+---------+-------------+-----------+-----------+------------+----------+
| 1        | 123456  | PR-1000     | Original  |  0        | 2000-01-01 |2020-04-09|
+----------+---------+-------------+-----------+-----------+------------+----------+
| 2        | 123456  | PR-1000     | Changed   |  1        | 2020-04-10 |9999-1231 |
+----------+---------+-------------+-----------+-----------+------------+----------+

Simplified view of Fact Table:

+--------------+---------+-----------+
| TimeEntryKey | IssueKey| TimeEntry |
+--------------+---------+-----------+
| 11111        | 1       | 1.25      |
+--------------+---------+-----------+
| 11112        | 2       | 1.5       |
+--------------+---------+-----------+

When I'm inserting into the fact table, I'm using whatever is the current IssueKey from the dimension table, which seems like the right approach. But if I want to get a SUM of the time entries and group them by the IssueName, it would result in 2 rows as the name has changed between the first and second rows. I'm under the impression that it's best to keep simple joins and use the keys, but in this instance it seems like you'd need to first join to the Dimension on the IssueKey, and then join that to the dimension again on the IssueNumber and IsCurrent = 1 in order to get the attributes for the current version of data. I have no problem doing this, but I'm also under the understanding that joins should be simple in a DW so the end user doesn't have to think through how those joins work, and this seems to contradict that understanding. Am I thinking about this correctly? You're not supposed to go back and update the FKs in the fact table, right? Do I need a conformed dimension or something to keep certain attributes consistent over time?

My current approach is to put the gymnastics in a view so the end user can just reference the view and doesn't have to think through all the joins. Am I thinking about this the wrong way?


Solution

  • I think your TimeEntry fact is a transactional fact table. I would suggest you add the following columns

    Now when you do aggregation, you would need to only look at the current rows in the fact table; your latest rows will also only look at the latest dim rows which you have already implemented. You would also not have to go and update a historical record in case there is a change in the source system. You might need to retire the old rows and add a new row with keys to the latest attribute.

    This might solve your problem.