I am currently developing a Data Warehouse and I was wondering if it makes sense to have a dimension connected to another via a foreign key.
For example, let's say we have two dimensions 'Country' and 'City' should we store only the City Dimension Key in the fact table. And the City is aware of it's Country.
Or does it make more sense two store both foreign keys in the fact table.
But the City dimension will have to know which Country it belongs to (it looks like it defies the star schema, since we now also have links between dimensions)
Or is this purely a design choice and will have no impact in querying, etc?
Not a straight answerm but consider these two scenarios;
factTransactionA >- dimCity
factTransactionA >- dimCity >- dimCountry
Both are valid but consider....
When you're not sure of a design decision.... look for other constraints or requirements that help you make a decision
For case B you have to have a Country dimension. You shouldn't for example "overload" the city dimension and try to make it fit a fact table at the grain of Country. So you know you must have this:
factTransactionB >- Country dimension table
So if I extend this explanation on the fly.... typically, you use "conformed" dimensions between fact tables, so when we consider both fact tables, I would actually suggest this type of schema:
factTransaction2 >- dimCountry -< factTransaction1 >- dimCity
Rather than this
factTransaction2 >- dimCountry -< dimCity -< factTransaction1
This actually means baking the dimCountry surrogate key into factTransaction1
which is actually at City level.
Because
So I feel in this long winded explanation I have posed one reason to avoid snowflake schemas, but they are definitely valid