relational-databasedatabase-schemadenormalizationdimensional-modelingoltp

Why we use Dimensional Model over Denormalized relational Model?


I am confused in some questions. I need their answers. If our relational model is also De-normalize then why we prefer dimensional model ? What is the reason we prefer dimensional model over relational model ? Your historical data can also stored in OLTP and you can perform reporting easily on any OLTP then why we use dimensional model and data warehouse ? What is the difference between a dimension and a de-normalized table ?

Thanks in advance


Solution

  • Short answer:

    If your lookups / retrievals from your OLTP tables are fast enough, and your specific search requirements do not have such complications as are described below, then there should not be a need to get into any dimensional star-schemas.

    Long answer:

    Dimensional and Denormalized models have different purposes. Dimensional models are generally used for data warehousing scenarios, and are particularly useful where super-fast query results are required for computed numbers such as "quarterly sales by region" or "by salesperson". Data is stored in the Dimensional model after pre-calculating these numbers, and updated as per some fixed schedule.

    But even without a data warehouse involved, a Dimensional model could be useful, and its purpose could complement that of the Denormalized model, as in the following example:

    A Dimensional model enables fast search. Joins between the dimension tables and the fact table are set up in a star-schema. Searching for John Smith would be simplified because we'll search for John OR Smith only in the relevant dimension table, and fetch the corresponding person ids from the fact table (fact table FKs point to dimension table PKs), thereby getting all persons with either of the 2 keywords in their name. (A further enhancement would enable us to search for all persons having variations of "John Smith" in their names e.g. John, Jon, Johnny, Jonathan, Smith, Psmith, Smythe by building snowflake dimensions.)

    A Denormalized model, on the other hand, enables fast retrieval, such as returning back a lot of columns about a specific item without having to join multiple tables together.

    So in the above scenario, we would first use the Dimensional model to get a set of IDs for the persons of our interest, and then use the Denormalized table to get full details of those selected IDs without having to do any further joins.

    This kind of a search would be very slow if we directly query the Denormalized tables, because a text search will need to be done on the PersonName column. It becomes even slower if we try to include the name variations, or if we need to add more search criteria.

    Excellent reference:

    An excellent reference for learning about the vast (and very interesting) topic of Dimensional Modeling is Ralph Kimball's The Data Warehouse Lifecycle Toolkit. Its companion volume The Data Warehouse Toolkit covers a large number of actual use cases.

    Hope this helps!