I have an abstract data modelling question. I have the following 4 tables:
Programme - a programme is a building project. e.g. Eiffel Tower building project. Programme ID is the unique identifier.
Work Stream - A programme has many workstreams. There is a work stream for each of the different items involved when completing a building (electricity workstream, windows workstream, plumbing workstream etc.). Work Stream ID is the unique identifier
Task - A work stream has many tasks. TASK ID is the unique identifier. This table also has start and end dates.
Cost - This table has a date column, TASK ID and Cost. Task ID is not a unique identifier, because costs can be charged to the same Task ID over several days.
I have two options to model the data, I would prefer option 2. However, I've been told that this is bad design. Can somebody please help me understand why option 1 would be better than 2?
This is the Star (option 1) vs Snowflake (option 2) dimension question. Both approaches are common, and de-normalizing the model so the fact table has all the dimension keys instead of having to traverse multiple dimensions is a performance optimization.
This is true in Power BI too, where the Star schema may perform better. However there's a tradeoff, as Programme and Work Stream won't flow filter to Tasks in the star schema design.
So the more common design would be to flatten the Programme, Workstream and Task into a single table. Flattening them into a single dimension preserves the relationship between the Work Stream and the task so you can see all the Tasks under a single Workstream.
But both designs will work fine, and perform similarly for many scenarios.