I have 3 tables. Each of them contain (among others) the columns UserID
and Period
.
In order to get all the Periods
associated with a specific user, I'm using unions like this:
# Original Query
SELECT Period FROM table_1 WHERE UserID = :user
UNION SELECT Period FROM table_2 WHERE UserID = :user
UNION SELECT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
The results I get are what I expect. Since I'm not using UNION ALL
, all duplicate results get automatically filtered out.
The debugger this project is using thinks the query MIGHT be slow. This is the query plan for it-
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | table_1 | ref | UserID | UserID | 4 | const | 29 | NULL |
2 | UNION | table_2 | ref | UserID | UserID | 4 | const | 5 | NULL |
3 | UNION | table_3 | ref | UserID | UserID | 4 | const | 4 | NULL |
NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
I thought maybe using SELECT DISTINCT
instead of SELECT
would help since each SELECT
result set would get smaller before the UNION
.
# Updated Query
SELECT DISTINCT Period FROM table_1 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_2 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
However this only seems to make the query plan use more temp tables.
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
---|---|---|---|---|---|---|---|---|---|
1 | PRIMARY | table_1 | ref | UserID | UserID | 4 | const | 29 | Using where; Using temporary |
2 | UNION | table_2 | ref | UserID | UserID | 4 | const | 5 | Using where; Using temporary |
3 | UNION | table_3 | ref | UserID | UserID | 4 | const | 4 | Using where; Using temporary |
NULL | UNION RESULT | <union1,2,3> | ALL | NULL | NULL | NULL | NULL | NULL | Using temporary |
I've also tried running this same query repeating lines 2-3 (the UNION
s) across 500 lines to see if there were noticeable differences. Time-wise, the results were pretty similar.
Analyzing the queries, using DISTINCT seemed to make the queries take less time returning the results. However, the query now had to clean up all temp tables, which ended up making the final time for both queries very similar.
My question(s) to all MySQL gurus, is there a way to make this query faster or more optimized without changing the table structures?
Maybe with some indexes? (On that note, Period
is a VARCHAR and the tables are all MyISAM
)
Do the order of the tables matter in the query? I purposedly tried to place the biggest table first.
# Original Query
SELECT Period FROM table_1 WHERE UserID = :user
UNION SELECT Period FROM table_2 WHERE UserID = :user
UNION SELECT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
It is processed as:
# Updated Query
SELECT DISTINCT Period FROM table_1 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_2 WHERE UserID = :user
UNION SELECT DISTINCT Period FROM table_3 WHERE UserID = :user
ORDER BY Period ASC;
It is processed as:
# Recommended Query
SELECT DISTINCT Period
FROM ( SELECT Period FROM table_1 WHERE UserID = :user
UNION ALL
SELECT Period FROM table_2 WHERE UserID = :user
UNION ALL
SELECT Period FROM table_3 WHERE UserID = :user ) AS total
ORDER BY Period ASC;