database-designamazon-redshiftdata-warehousestar-schema

Using hash keys instead of integer keys on Redshift


I'm building a star schema model on AWS Redshift which contains dimensions and facts.The dimension tables contain integer values as keys typically using an identity column.This is being used in lookup while loading the fact table and populating the fact table with the dimension keys as required for every record. I have been told that we can use hash keys instead of using integer keys which would simplify the loading process as we do not have to do a lookup and instead just hash the value of natural/business key that is being used. Has anyone tried this approach and what are the pros and cons of using hashed keys as dimension keys in the data model.The models will further be used in Power BI for reporting, will there be an performance impact of using hash keys there.


Solution

  • The models will further be used in Power BI for reporting, will there be an performance impact of using hash keys there.

    Assuming that the output of the hash is an integer, then there will very little performance difference between the two approaches. You may get better compression on your fact table using incremental integers like 0x00000D3F0 instead of hash-generated ones like 0xE4AE353D, but I wouldn't expect it to be a big deal.