With an example table (> 20M rows) like this :
CREATE TABLE T (
D DATE,
V INT
);
INSERT INTO T VALUES ('2024-07-01', 1), ('2024-07-02', 2), ('2024-07-02', 3);
I have an existing query that take about 5s :
SELECT D, 'SUM', SUM(V)
FROM T
GROUP BY D
UNION
SELECT D, 'AVG', AVG(V)
FROM T
GROUP BY D
ORDER BY 1;
And needed output :
Column A | Column B | Column B |
---|---|---|
2024-07-01 | SUM | 1.0 |
2024-07-01 | AVG | 1.0 |
2024-07-02 | SUM | 5.0 |
2024-07-02 | AVG | 2.5 |
To avoid multiple scan I rewrite like this (about 1 s) :
SELECT D, SUM(V), AVG(V)
FROM T
GROUP BY D;
I need to keep output, can only do it in one query, so I tried a common table expression :
WITH CTE AS (
SELECT D, SUM(V) AS S, AVG(V) AS A
FROM T
GROUP BY D
)
SELECT D, 'SUM', S
FROM CTE
UNION
SELECT D, 'AVG', A
FROM CTE
ORDER BY 1;
But table is still scan twice, and query is still in 5s :
select_type | table |
---|---|
PRIMARY | < derived2> |
DERIVED | T |
UNION | < derived4> |
DERIVED | T |
UNION RESULT | <union1,3> |
Is there an option to do it in one query and only one scan ?
EDIT:
Found a solution with json but I don't like it :
WITH CTE AS (
SELECT D, JSON_ARRAY(SUM(V), AVG(V)) AS data
FROM T
GROUP BY D
)
SELECT
c.D,
CASE WHEN JT.Id = 1 THEN 'SUM'
WHEN JT.Id = 2 THEN 'AVG'
END AS F,
JT.N
FROM CTE c,
JSON_TABLE(c.data, '$[*]'
COLUMNS(
Id for ordinality,
N FLOAT PATH '$[0]'
)
) AS JT;
I would use a subquery for this:
select D, rowtype, case rowtype when 'SUM' then sum_v else avg_v end value
from (
select D, sum(V) sum_v, avg(V) avg_v
from T
group by D
) sum_or_avg
join (select 'SUM' rowtype union all select 'AVG') rowtype
order by D, rowtype desc