I have a couple of tables like so:
CREATE TABLE cycles (
`cycle` varchar(6) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`cycle_type` varchar(140) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`start` date DEFAULT NULL,
`end` date DEFAULT NULL
);
CREATE TABLE rsvn (
`str` varchar(140) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`start_date` date DEFAULT NULL,
`end_date` date DEFAULT NULL
);
INSERT INTO `cycles` (`cycle`, `cycle_type`, `start`, `end`) values
('202013', 'a', '2021-01-04', '2021-01-31'),
('202013', 'b', '2021-01-04', '2021-01-31'),
('202101', 'a', '2021-01-04', '2021-01-31'),
('202101', 'b', '2021-01-04', '2021-01-31'),
('202102', 'a', '2021-02-01', '2021-02-28'),
('202102', 'b', '2021-02-01', '2021-02-28'),
('202103', 'a', '2021-03-01', '2021-03-28'),
('202103', 'b', '2021-03-01', '2021-03-28');
INSERT INTO `rsvn` (str, start_date, end_date) values
('STR01367', '2020-12-07', '2020-06-21'),
('STR00759', '2020-12-07', '2021-04-25'),
('STR01367', '2021-01-04', '2021-09-12'),
('STR01367', '2021-06-21', '2022-02-27');
For any given cycle, I want to count the number of occurrences of str across cycles. So between cycle 2108 - 2108 (one cycle), I see:
str | count |
---|---|
STR01367 | 1 |
STR00759 | 1 |
And from between 2108 - 2109 (two cycles) I see:
str | count |
---|---|
STR01367 | 2 |
STR00759 | 1 |
I'm trying to figure out how to dynamically obtain those results. I don't see any options outside a UNION ALL query (one query for each cycles), so I tried writing a PROCEDURE. However, that didn't work because I want to do post-processing on the query results, and I don't believe you can use the results of a PROCEDURE in a CTE or subquery.
My PROCEDURE (works, can't include results in a subquery like SELECT * FROM call count_cycles
(?)):
CREATE PROCEDURE `count_cycles`(start_cycle CHAR(6), end_cycle CHAR(6))
BEGIN
SET @cycles := (
SELECT CONCAT('WITH installed_cycles_count AS (',
GROUP_CONCAT(
CONCAT('
SELECT rsvn.str, 1 AS installed_cycles
FROM rsvn
WHERE "', `cy`.`start`, '" BETWEEN rsvn.start_date AND COALESCE(rsvn.end_date, "9999-01-01")
OR "', `cy`.`end`, '" BETWEEN rsvn.start_date AND COALESCE(rsvn.end_date, "9999-01-01")
GROUP BY rsvn.str
'
)
SEPARATOR ' UNION ALL '
),
')
SELECT
store.chain AS "Chain"
,store.division AS "Division"
,dividers_store AS "Store"
,SUM(installed_cycles) AS "Installed Cycles"
FROM installed_cycles_count r
LEFT JOIN store ON store.name = r.dividers_store
GROUP BY dividers_store
ORDER BY chain, division, dividers_store, installed_cycles'
)
FROM cycles `cy`
WHERE `cy`.`cycle_type` = 'Ad Cycle'
AND `cy`.`cycle` >= CONCAT('20', RIGHT(start_cycle, 4))
AND `cy`.`cycle` <= CONCAT('20', RIGHT(end_cycle, 4))
GROUP BY `cy`.`cycle_type`
);
EXECUTE IMMEDIATE @cycles;
END
Alternatively, I attempted to use a recursive query to obtain my results by incrementing my cycle. This gave me the cycles I wanted:
WITH RECURSIVE xyz AS (
SELECT cy.`cycle`, cy.`start`, cy.`end`
FROM cycles cy
WHERE cycle_type = 'Ad Cycle'
AND `cycle` = '202101'
UNION ALL
SELECT cy.`cycle`, cy.`start`, cy.`end`
FROM xyz
JOIN cycles cy
ON cy.`cycle` = increment_cycle(xyz.`cycle`, 1)
AND cy.`cycle_type` = 'Ad Cycle'
WHERE cy.`cycle` <= '202110'
)
SELECT * FROM xyz;
But I can't get it working when I add in the reservations table: infinite loop?
WITH RECURSIVE xyz AS (
SELECT cy.`cycle`, 'dr.dividers_store', 1 AS installed_cycles
FROM cycles cy
LEFT JOIN rsvn dr
ON cy.`start` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
OR cy.`end` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
WHERE cy.`cycle_type` = 'Ad Cycle'
AND cy.`cycle` = '202101'
UNION ALL
SELECT cy.`cycle`, 'dr.dividers_store', 1 AS installed_cycles
FROM xyz
JOIN cycles cy
ON cy.`cycle` = increment_cycle(xyz.`cycle`, 1)
AND cy.`cycle_type` = 'Ad Cycle'
LEFT JOIN rsvn dr
ON cy.`start` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
OR cy.`end` BETWEEN dr.start_date AND COALESCE(dr.end_date, "9999-01-01")
WHERE cy.`cycle` <= '202102'
)
SELECT * FROM xyz
What options do I have to get the results I need, in such a way that I can use them in a CTE or subquery?
The results I am looking for are easily obtained via a two-stage grouping. Something like this:
WITH sbc AS (
SELECT cy.`cycle`, dr.str, 1 AS 'count'
FROM cycles cy
LEFT JOIN rsvn dr
ON cy.`start` BETWEEN dr.start_date AND dr.end_date
OR cy.`end` BETWEEN dr.start_date AND dr.end_date
WHERE cy.`cycle_type` = 'Ad Cycle'
AND cy.`cycle` BETWEEN '202201' AND '202205'
GROUP BY cy.`cycle`, dr.str
ORDER BY dr.str, cy.`cycle`
)
SELECT `cycle`, str, SUM(`count`) as `count`
FROM sbc
GROUP BY str
The CTE produces one result per rsvn per cycle. Afterwards all that is needed is to group by store and count the number of occurrences.
Besides being simpler, I suspect that this query is faster than the union concept I was stuck on when I asked the question, since among other things the server does not need to perform a union on multiple grouping queries. However, I do not understand how MariaDB optimizes such queries, and while I am curious I don't have the time to run benchmarks to find out.