data-modelingdata-warehousedimensional-modelingdata-vault

Why using sequence number against version number in dimension table by datawarehouse modeling


In the context of dimension modeling, as typical case, it is great to have surrogate key in dimension table to track the change of rows (http://www.kimballgroup.com/2006/07/design-tip-81-fact-table-surrogate-key/).

There are three common ways to realize surrogate key 1) sequence number 2) version number 3) hash key (used by data vault)

My question is: why sequence number is preferred in most of dimension modeling I have seen.


Solution

  • I think there are a couple of reasons why sequence number is generally used, but I don't think it's a clearly superior way of doing things in all situations.

    Sequence Number

    Pros

    Cons

    Version Number

    I haven't seen examples of this before, and googling about it seems to turn up this question and some references to appending it to existing fields, so I'm going to assume that you're talking about appending the version to a sequence or hash, or some other identifier.

    Pros

    Cons

    Hash

    If you're not going to use a sequence number, this is probably my preferred option. Needs some pretty specific circumstances though I think

    Pros

    Conclusion

    So it depends on your situation, but sequence number is just so easy to implement and the cons are almost completely negligible in almost all situations, to the point where it sits as a comfortable default. Therefore choosing another option usually falls into the "you have to explain why you did it" category.