I have a slowly changing type 2 price change table which I need to reduce the size of to improve performance. Often rows are written to the table even if no price change occurred (when some other dimensional field changed) and the result is that for any product the table could be 3-10x the size it needs to be if it were including only changes in price.
I'd like to compress the table so that it only has contains the first effective date and last expiration date for each price until that price changes that can also
As an example if i have this raw data:
Product | Price Effective Date | Price Expiration Date | Price |
---|---|---|---|
123456 | 6/22/18 | 9/19/18 | 120 |
123456 | 9/20/18 | 11/8/18 | 120 |
123456 | 11/9/18 | 11/29/18 | 120 |
123456 | 11/30/18 | 12/6/18 | 120 |
123456 | 12/7/18 | 12/19/18 | 85 |
123456 | 12/20/18 | 1/1/19 | 85 |
123456 | 1/2/19 | 2/19/19 | 85 |
123456 | 2/20/19 | 2/20/19 | 120 |
123456 | 2/21/19 | 3/19/19 | 85 |
123456 | 3/20/19 | 5/22/19 | 85 |
123456 | 5/23/19 | 10/10/19 | 85 |
123456 | 10/11/19 | 6/19/19 | 80 |
123456 | 6/20/20 | 12/31/99 | 80 |
I need to transform it into this:
Product | Price Effective Date | Price Expiration Date | Price |
---|---|---|---|
123456 | 6/22/18 | 12/6/18 | 120 |
123456 | 12/7/18 | 2/19/19 | 85 |
123456 | 2/20/19 | 2/20/19 | 120 |
123456 | 2/21/19 | 10/10/19 | 85 |
123456 | 10/11/19 | 12/31/99 | 80 |
This is a type of gaps-and-islands problem. I would recommend reconstructing the data, saving it in a temporary table, and then reloading the existing table.
The code to reconstruct the data is:
select product, price, min(effective_date), max(expiration_date)
from (select t.*,
sum(case when prev_expiration_date = effective_date - interval '1 day' then 0 else 1 end) over (partition by product order by effective_date) as grp
from (select t.*,
lag(expiration_date) over (partition by product, price order by effective_date) as prev_expiration_date
from t
) t
) t
group by product, price, grp;
Note that the logic for date arithmetic varies depending on the database.
Save this result into a temporary table, temp_t
or whatever, using select into
, create table as
, or whatever your database supports.
Then empty the current table and reload it:
truncate table t;
insert into t
select product, price, effective_date, expiration_date
from temp_t;
Notes:
truncate_table
!