I came across a data warehousing practice question. I tried to think of various ways to design this but I'm not sure what the best practices are. The question deals with designing a data warehouse for a parking lot and writing a SQL query for getting the parking fees.
The constraints are as follows:
Weekday hourly rates
Two wheeler - 1$
Four wheeler - 2$
Weekend hourly rates
Two wheeler - 2$
Four wheeler - 3$
A car is parked from Friday morning 9am till Saturday 10am. Design a data warehouse to stored this data and write a SQL to get the parking fees for a vehicle.
I could only think of below two ways of representing it,
Having a date_id, time_id and a type. Querying the parking fees can be difficult here since we do not have data at the grain of an hour. Difficult to calculate parking fees but consume less data
fact_parking_lot_data
fact_key | vehicle_id | date_id | time_id | type |
---|---|---|---|---|
1 | 1 | 20220506 | 9 | in |
2 | 1 | 20220507 | 22 | out |
Having a date_id, time_id for each hour of the day. This would created multiple fact table entries for the vehicle, if the vehicle is parked for 2 days then it would have 48 records. Easy to calculate parking fees but consume lot of storage
fact_parking_lot_data
fact_key | vehicle_id | date_id | time_id |
---|---|---|---|
1 | 1 | 20220506 | 9 |
2 | 1 | 20220506 | 10 |
3 | 1 | 20220506 | 11 |
4 | 1 | 20220506 | 12 |
. | . | . | . |
. | . | . | . |
. | . | . | . |
26 | 1 | 20220507 | 10 |
Any thoughts or suggestion would be really appreciated. Thank you !
Your model is a clear cut example of an accumulating snapshot table: foreign keys to dimensions would be vehicle_id, date_in_id, date_out_id, time_in_id and time_out_id. And as a measure the duration of parking.
When a car comes in date_in_id and time_in_id are populated, but not date_out_id, time_out_id nor duration. When the car leaves then the date_out_id, time_out_id and duration are populated.
That gives you a natural metric to calculate: the total duration across a day, or several days.
The disadvantage of an accumulating snapshot is that it requires lookups on the fact table and updates on "out" events, but I'm guessing your fact table won't be too large (you don't have a location_id in your model so I'm assuming we're talking a few hundred cars per day, maybe up to a couple thousand).
If you're not happy with the accumulating snapshot then my preference between the two models you suggest goes to the 2nd one, where each hour is populated while the car is parked.
Now, a few remarks:
A disadvantage of the accumulating snapshot vs your 2nd model (with 1 row per time period): there's no easy way to count how many cars are parked at a given time. You'll have to count rows across the entire fact table where
(date_in_id < X or (date_in_id = X and time_in_id <= Y))
and
(date_out_id > X or (date_out_id = X and time_out_id > Y))
Whereas your model allows you to quickly count how many cars are parked at any given moment by just counting all those with
date_id = X and time_id = Y