data-warehousebusiness-intelligencedimensional-modelingfact-tablenatural-key

Natural Key and Fact tables


I'm new on dimensional modelling I believe that you guys can help me in the following doubts.

In the production system I have a transaction table, sales table for example.The unique identifier is a primary key called SaleId. Example:

enter image description here

My doubt is when modelling the fact table should the SaleID be included in the fact table as a NaturalKey?

enter image description here

Also should the Fact table have a SurrogateKey?

Please feel free to send me any link as reference. Thanks in advance


Solution

  • Technically speaking, it is probably not a natural key - it does look system generated. However, sometimes it is very valid to store a system generated ID in a Fact for use as a Degenerate Dimension. Usually, these are cases where either the business users do have sight of this system generated ID (order numbers, invoice numbers, purchase order numbers, etc.), or where there's no other useful way of identifying some rows which can be usefully grouped together.

    If the users of your BI solutions are likely to want to be able to drill down into information and look at it by sale, then the SaleID might well be a good candidate for this treatment. Have a think whether there's any other way for them to get to this level - could a customer be associated with two distinct sales on the same day? If so, would your users want to look at them as two separate sales? You might need to speak to them to find out what's going to be useful for them. If for some reason you can't get a clear answer, I'd say keep it - there's little harm, and you can always remove it later if it's not used.

    Here's the Kimball group's take on Degenerate Dimensions, in case you're at all unclear on how they work:

    http://www.kimballgroup.com/2003/06/design-tip-46-another-look-at-degenerate-dimensions/


    As far as Fact table surrogate keys - I always use them. As Kimball's Design Tip #81 points out, they're sometimes useful, and it's the kind of thing I'd rather put in at the beginning and not use than realise later on that it would have been useful to have. Point 2 - where you might want to make updates by inserting new rows and deleting the old ones - certainly applies to work I've done.