I have a table that my daily script sends the findings to. It looks something like this:
product | issues_found | run_date |
---|---|---|
Alpha | 2 | 2024-8-12 |
Alpha | 5 | 2024-8-11 |
Alpha | 3 | 2024-8-10 |
Alpha | 0 | 2024-8-9 |
Alpha | 1 | 2024-8-8 |
Alpha | 5 | 2024-8-7 |
Beta | 4 | 2024-8-12 |
Beta | 3 | 2024-8-11 |
I want to find out the date when the current set of issues started for every product (i.e., the latest date when issues_found
turned from 0 to non-zero), and add it to another column. So the table should become something like:
product | issues_found | run_date | issues_started |
---|---|---|---|
Alpha | 2 | 2024-8-12 | 2024-8-10 |
Alpha | 5 | 2024-8-11 | 2024-8-10 |
Alpha | 3 | 2024-8-10 | 2024-8-10 |
Alpha | 0 | 2024-8-9 | 2024-8-10 |
Alpha | 1 | 2024-8-8 | 2024-8-10 |
Alpha | 5 | 2024-8-7 | 2024-8-10 |
Beta | 4 | 2024-8-12 | 2024-8-11 |
Beta | 3 | 2024-8-11 | 2024-8-11 |
My approach is to iterate over date-sorted records for every product and break when 0 is encountered:
DECLARE found_date DATE;
FOR row in (SELECT * FROM my_table)
DO
SET found_date = NULL;
FOR historical_entry in (SELECT * FROM my_table WHERE product = row.product ORDER BY run_date DESC)
DO
IF historical_entry.issues_found <> 0 THEN
SET found_date = historical_entry.run_date;
ELSE
BREAK;
END IF;
END FOR;
UPDATE my_table SET issues_started = found_date where product = row.product;
END FOR;
It's a very procedural approach and although it works, it takes hours when run against hundreds of products and thousands of records. Is there a better approach than two nested loops? I tried using MIN(run_date)
over aggregates of non-zero issues_found
rows but couldn't quite get it right.
Consider below approach
WITH issue_starts AS (
SELECT
product,
run_date,
issues_found,
CASE
WHEN issues_found <> 0 AND IFNULL(LAG(issues_found) OVER (PARTITION BY product ORDER BY run_date), 0) = 0
THEN run_date
END AS start_of_issues
FROM your_table
), latest_issue_start AS (
SELECT
product,
run_date,
issues_found,
MAX(start_of_issues) OVER (PARTITION BY product ORDER BY run_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS issues_started
FROM issue_starts
)
SELECT
t.product,
t.issues_found,
t.run_date,
l.issues_started
FROM your_table t
JOIN latest_issue_start l
ON t.product = l.product AND t.run_date = l.run_date
ORDER BY t.product, t.run_date DESC
if applied to sample data in your question - output is