database-designprimary-keydata-warehousefact-table

Why primary key is (not) required on fact table in dimensional modelling?


I have heard a few references that pk is not required on fact table. I believe every single table should have a pk.

How could a person understand a row in a fact table if there is no pk and 10+ foreign keys.


Solution

  • Primary Key is there

    ... but Enforcing the primary key constraint in database level is not required.

    If you think about this, technically a unique key or primary key is a key that uniquely defines the characteristics of each row. And it can be composed of more than one attributes of that entity. Now in the case of a Fact table, foreign keys flowing-in from the other dimension tables together already act as a compounded primary key. And these foreign-key combinations can uniquely identify each record in the fact table. So, this foreign key combination is the primary key for the fact table.

    Why not a Surrogate Key then?

    Now if you wanted, you could have defined one surrogate key for the fact table. But what purpose would that serve? You are never going to retrieve one record from that fact table referring its surrogate key (use Indexes instead). Neither you are going to use that surrogate key to join the fact with other tables. Such a surrogate key will be completely waste of space in the database.

    Enforcing Database Constraints

    When you define this conceptual primary key in the database level, database needs to ensure that this constraint is not getting violated in any of the DML operation performed over it. Ensuring this constraint is a overhead for your database. It might be insignificant for an OLTP system, but for a large OLAP system where data are loaded in batch, this may incur significant performance penalties. Beside, why do you want your database to ensure the integrity of the constraints when you can ensure the same during the data loading phase itself (typically through your ETL coding).