I've put together a simplified version of a query I'm struggling with:
WITH cte AS (
SELECT
num_serves_p1 AS num_serves, player_id_p1 AS player_id, location, date_
FROM
table1
UNION ALL SELECT
num_serves_p2 AS num_serves, player_id_p2 AS player_id, location, date_
FROM
table1
)
SELECT
id_, SUM(cte.num_serves)
FROM
table1 AS t
JOIN
cte ON cte.player_id = t.player_id_p1
AND cte.location = t.location
AND cte.date_ < t.date_
WHERE
t.id_ = 12345
This currently takes over a minute to run for about a million rows.
If I chop out the second part of the UNION_ALL
to give me:
WITH cte AS (
SELECT
num_serves_p1 AS num_serves, player_id_p1 AS player_id, location, date_
FROM
table1
)
SELECT
id_, SUM(cte.num_serves)
FROM
table1 AS t
JOIN
cte ON cte.player_id = t.player_id_p1
AND cte.location = t.location
AND cte.date_ < t.date_
WHERE
t.id_ = 12345
Or the first part of the UNION_ALL
to give me:
WITH cte AS (
SELECT
num_serves_p2 AS num_serves, player_id_p2 AS player_id, location, date_
FROM
table1
)
SELECT
id_, SUM(cte.num_serves)
FROM
table1 AS t
JOIN
cte ON cte.player_id = t.player_id_p1
AND cte.location = t.location
AND cte.date_ < t.date_
WHERE
t.id_ = 12345
Then either of these queries run in under 0.1 seconds.
So I figure all my indexes are good but the issue lies in using UNION_ALL
. Given the drastic difference in performance then my hunch is that in the first query the CTE loads all the rows in table1 (x2) whereas in the second and third queries the CTE only loads the rows required by the joins?
I realise that I could create two CTEs and simply add the results together however at some point I'm going to calculate fields that rely on a date ordered UNION_ALL
CTE.
Any ideas on why I'm seeing the poor performance with the first query and what I might be able to do about it?
Quick final note: I haven't included all the usual table descriptions etc because this is really a pseudo query. I figured there'd be enough to go on as I've isolated the usage of UNION_ALL
as being the likely issue. I'm also not sure how to generate sufficient data to illustrate the issue. Fingers crossed the above is enough.
The Optimizer is not smart enough with unions, joins, CTEs, to to propagate the "id=" all the way through. Instead, it is evaluating each part of the UNION
, creating a temp table with all those rows, and possibly building an index to help with the next step.
It may run even faster if you do the SUM
for each table copy and then add them together. (I can't even tell if that is a valid transformation, and the Optimizer won't even try it.)
More
AND cte.date_ < t.date_
WHERE
t.id_ = 12345
-->
AND cte.date < ( SELECT date_ FROM t WHERE id_ = 12345 )
Or, this might be better:
SELECT ...
FROM ( SELECT date_ FROM t WHERE id_ = 12345 ) AS d
JOIN ... (( the other stuff ))
WHERE cte.date_ < d.date_
and more
SELECT
(
SELECT SUM(num_serves)
FROM table1
WHERE player_id = this.p1_id
) AS p1_prev_serves,
(
SELECT SUM(num_serves)
FROM table1
WHERE player_id = this.p2_id
) AS p2_prev_serves
FROM table1 AS this
WHERE player_id IN (this.p1_id, this.p2_id)
AND this.game_id = 12345;
SELECT SUM(num_serves) AS prev_serves
FROM table1 AS all