I have a dataset structured as follows which contains thousands of materials and millions of rows:
Material | Date | Stock | Demand | Supply | Projected Stock | Target Stock | Overstock |
---|---|---|---|---|---|---|---|
123456 | 24/06/2024 | 60 | 0 | 0 | 60 | 24 | Yes |
123456 | 26/06/2024 | 0 | 4 | 0 | 56 | 24 | Yes |
123456 | 27/06/2024 | 0 | 4 | 0 | 52 | 24 | Yes |
123456 | 02/07/2024 | 0 | 0 | 10 | 62 | 24 | Yes |
123456 | 04/07/2024 | 0 | 0 | 10 | 72 | 24 | Yes |
123456 | 04/07/2024 | 0 | 0 | 1 | 73 | 24 | Yes |
123456 | 04/07/2024 | 0 | 1 | 0 | 72 | 24 | Yes |
123456 | 04/07/2024 | 0 | 1 | 0 | 71 | 24 | Yes |
123456 | 04/07/2024 | 0 | 4 | 0 | 67 | 24 | Yes |
123456 | 04/07/2024 | 0 | 4 | 0 | 63 | 24 | Yes |
123456 | 05/07/2024 | 0 | 4 | 0 | 59 | 24 | Yes |
123456 | 11/07/2024 | 0 | 0 | 11 | 70 | 24 | Yes |
123456 | 13/07/2024 | 0 | 1 | 0 | 69 | 24 | Yes |
123456 | 13/07/2024 | 0 | 1 | 0 | 68 | 24 | Yes |
123456 | 13/07/2024 | 0 | 1 | 0 | 67 | 24 | Yes |
123456 | 14/07/2024 | 0 | 4 | 0 | 63 | 24 | Yes |
123456 | 16/07/2024 | 0 | 4 | 0 | 59 | 24 | Yes |
123456 | 18/07/2024 | 0 | 0 | 11 | 70 | 24 | Yes |
123456 | 18/07/2024 | 0 | 4 | 0 | 66 | 24 | Yes |
123456 | 19/07/2024 | 0 | 1 | 0 | 65 | 24 | Yes |
123456 | 19/07/2024 | 0 | 1 | 0 | 64 | 24 | Yes |
123456 | 21/07/2024 | 0 | 4 | 0 | 60 | 24 | Yes |
123456 | 21/07/2024 | 0 | 1 | 0 | 59 | 24 | Yes |
123456 | 21/07/2024 | 0 | 1 | 0 | 58 | 24 | Yes |
123456 | 21/07/2024 | 0 | 1 | 0 | 57 | 24 | Yes |
123456 | 25/07/2024 | 0 | 0 | 9 | 66 | 24 | Yes |
123456 | 25/07/2024 | 0 | 4 | 0 | 62 | 24 | Yes |
123456 | 30/07/2024 | 0 | 0 | 9 | 71 | 24 | Yes |
123456 | 30/07/2024 | 0 | 0 | 9 | 80 | 24 | Yes |
123456 | 30/07/2024 | 0 | 0 | 9 | 89 | 24 | Yes |
123456 | 30/07/2024 | 0 | 0 | 9 | 98 | 24 | Yes |
123456 | 30/07/2024 | 0 | 0 | 9 | 107 | 24 | Yes |
…. | …. | …. | …. | …. | …. | …. | …. |
123457 | 24/06/2024 | 30 | 0 | 0 | 30 | 24 | No |
123457 | 26/06/2024 | 0 | 4 | 0 | 26 | 24 | No |
123457 | 27/06/2024 | 0 | 4 | 0 | 22 | 24 | No |
123457 | 02/07/2024 | 0 | 0 | 10 | 32 | 24 | No |
123457 | 04/07/2024 | 0 | 0 | 10 | 42 | 24 | No |
123457 | 04/07/2024 | 0 | 0 | 1 | 43 | 24 | No |
123457 | 04/07/2024 | 0 | 1 | 0 | 42 | 24 | No |
123457 | 04/07/2024 | 0 | 1 | 0 | 41 | 24 | No |
123457 | 04/07/2024 | 0 | 4 | 0 | 37 | 24 | No |
123457 | 04/07/2024 | 0 | 4 | 0 | 33 | 24 | No |
123457 | 05/07/2024 | 0 | 4 | 0 | 29 | 24 | No |
123457 | 11/07/2024 | 0 | 0 | 11 | 40 | 24 | No |
123457 | 13/07/2024 | 0 | 1 | 0 | 39 | 24 | No |
123457 | 13/07/2024 | 0 | 1 | 0 | 38 | 24 | No |
123457 | 13/07/2024 | 0 | 1 | 0 | 37 | 24 | No |
123457 | 14/07/2024 | 0 | 4 | 0 | 33 | 24 | No |
123457 | 16/07/2024 | 0 | 4 | 0 | 29 | 24 | No |
123457 | 18/07/2024 | 0 | 0 | 11 | 40 | 24 | No |
123457 | 18/07/2024 | 0 | 4 | 0 | 36 | 24 | No |
123457 | 19/07/2024 | 0 | 1 | 0 | 35 | 24 | No |
123457 | 19/07/2024 | 0 | 1 | 0 | 34 | 24 | No |
123457 | 21/07/2024 | 0 | 4 | 0 | 30 | 24 | No |
123457 | 21/07/2024 | 0 | 1 | 0 | 29 | 24 | No |
123457 | 21/07/2024 | 0 | 1 | 0 | 28 | 24 | No |
123457 | 21/07/2024 | 0 | 1 | 0 | 27 | 24 | No |
123457 | 25/07/2024 | 0 | 0 | 9 | 36 | 24 | No |
123457 | 25/07/2024 | 0 | 4 | 0 | 32 | 24 | No |
123457 | 30/07/2024 | 0 | 0 | 9 | 41 | 24 | No |
123457 | 30/07/2024 | 0 | 0 | 9 | 50 | 24 | ? |
123457 | 30/07/2024 | 0 | 0 | 9 | 59 | 24 | ? |
123457 | 30/07/2024 | 0 | 0 | 9 | 68 | 24 | ? |
123457 | 30/07/2024 | 0 | 0 | 9 | 77 | 24 | ? |
…. | …. | …. | …. | …. | …. | …. | …. |
Because the date column skips and repeats some dates, it is difficult for me to do what I want.
I need to calculate whether the projected stock is 200% or more for every 28 consecutive days. So in the example above, the projected stock is 200% above the target stock (24*200%=48), so there needs to be a column which checks the current row, and looks in the next 28 days (so not 28 rows) and if every day of those 28 days the projected stock is above 48, then the relevant material is flagged. For reference, my query currently uses CTE to convert the raw data into below dataset.
Projected_stock as (
select
Material,
Date,
Stock,
Target_Stock,
Demand,
Supply,
sum(Stock - Demand + Supply) over (partition by material order by date, rows between unbounded preceding and current row) as Projected_Stock
from datasource
group by all
order by material,date)
Thank you.
Edit: I am using databricks and what I expect to see is that for this material 123456, next to the column Target_Stock, there is a another column Overstock, which has a flag on all rows, because therea are more than 28 consecutive days where the projected stock is above 200% of the target stock (48)
Postgres and MySQL versions of the solution. DateAdd function may vary for you.
PostGres fiddle
WITH Projected_stock as
(
select
Material,
Date,
Stock,
Target_Stock,
Demand,
Supply,
sum(Stock - Demand + Supply) over (partition by material order by date, rows between unbounded preceding and current row) as Projected_Stock
from datasource
group by all
order by material,date),
CTE as
(
SELECT material, c.date,
CASE
WHEN MIN(CASE WHEN projected_stock>48 THEN '1' ELSE '0' END)!=MAX(CASE WHEN projected_stock>48 THEN '1' ELSE '0' END) THEN '?'
ELSE MIN(CASE WHEN projected_stock>48 THEN '1' ELSE '0' END)
END as Overstock
FROM Projected_stock c
WHERE c.date between c.date AND c.date + INTERVAL '28 day' --dateadd(DAY, 28, c.date)
GROUP BY material, date
),
SELECT p.*, c.overstock
FROM CTE c
LEFT JOIN Projected_stock p ON p.date=c.date and c.material=p.material
MySQL fiddle
WITH Projected_stock as
(
select
Material,
Date,
Stock,
Target_Stock,
Demand,
Supply,
sum(Stock - Demand + Supply) over (partition by material order by date, rows between unbounded preceding and current row) as Projected_Stock
from datasource
group by all
order by material,date), CTE as
(
SELECT material, c.Date,
CASE
WHEN MIN(CASE WHEN Projected>48 THEN '1' ELSE '0' END)!=MAX(CASE WHEN Projected>48 THEN '1' ELSE '0' END) THEN '?'
ELSE MIN(CASE WHEN Projected>48 THEN '1' ELSE '0' END)
END as Overstock
FROM Projected_stock c
WHERE c.Date between c.Date AND DATE_ADD(c.Date, INTERVAL 28 DAY)
GROUP BY c.material, c.Date
),
SELECT p.*, c.overstock
FROM CTE c
LEFT JOIN Projected_stock p ON p.Date=c.Date and c.material=p.material