mysqlquery-optimizationsql-order-bydatabase-indexes

MySQL: Is there any way to optimize multiple table ORDER BY with following DB structure?


I have two tables with following structure and indexes (real names are changed for purpose):

mysql> describe table1;
+---------+---------------------+------+-----+-------------------+----------------+
| Field   | Type                | Null | Key | Default           | Extra          |
+---------+---------------------+------+-----+-------------------+----------------+
| id      | bigint(20) unsigned | NO   | PRI | NULL              | auto_increment |
| field1  | varchar(255)        | NO   | UNI | NULL              |                |
| date    | timestamp           | NO   | MUL | CURRENT_TIMESTAMP |                |
| text    | varchar(10000)      | NO   |     | NULL              |                |
| flag    | tinyint(1)          | YES  |     | 0                 |                |
+---------+---------------------+------+-----+-------------------+----------------+

mysql> show indexes from table1;
+--------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name             | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table1 |          0 | PRIMARY              |            1 | id          | A         |     1420047 |     NULL | NULL   |      | BTREE      |         |               |
| table1 |          0 | table1_field1_unique |            1 | field1      | A         |     1420047 |     NULL | NULL   |      | BTREE      |         |               |
| table1 |          1 | table1_date_idx      |            1 | date        | A         |     1420047 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+--------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

mysql> describe table2;
+------------------------+---------------------+------+-----+---------+----------------+
| Field                   | Type                | Null | Key | Default | Extra          |
+------------------------+---------------------+------+-----+---------+----------------+
| id                      | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| table1_id               | bigint(20) unsigned | NO   | MUL | NULL    |                |
| some1_id                | bigint(20) unsigned | YES  | MUL | NULL    |                |
| some1_name              | varchar(255)        | YES  | MUL | NULL    |                |
| some2_id                | bigint(20) unsigned | NO   | MUL | NULL    |                |
| some2_name              | varchar(255)        | NO   | MUL | NULL    |                |
| some3_name              | varchar(255)        | YES  | MUL | NULL    |                |
| some4_email             | varchar(255)        | YES  |     | NULL    |                |
| some4_name              | varchar(255)        | YES  | MUL | NULL    |                |
| some4_place1_gift       | varchar(255)        | YES  |     | NULL    |                |
| some4_place2_gift       | varchar(255)        | YES  |     | NULL    |                |
| some4_place3_gift       | varchar(255)        | YES  |     | NULL    |                |
+------------------------+---------------------+------+-----+---------+----------------+

mysql> show indexes from table2;
+--------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table        | Non_unique | Key_name                 | Seq_in_index | Column_name| Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| table2|          0 | PRIMARY                         |            1 | id         | A         |      462911 |     NULL | NULL   |      | BTREE      |         |               |
| table2|          1 | table2_table1_table1_id_foreign |            1 | table1_id  | A         |      462911 |     NULL | NULL   |      | BTREE      |         |               |
| table2|          1 | some4_name_idx                  |            1 | some4_name | A         |        5645 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some3_name_idx                  |            1 | some3_name | A         |        3560 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some2_id_idx                    |            1 | some2_id   | A         |         116 |     NULL | NULL   |      | BTREE      |         |               |
| table2|          1 | some1_id_idx                    |            1 | some1_id   | A         |         390 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some1_name_idx                  |            1 | some1_name | A         |        1727 |     NULL | NULL   | YES  | BTREE      |         |               |
| table2|          1 | some2_name_idx                  |            1 | some2_name | A         |         221 |     NULL | NULL   |      | BTREE      |         |               |
+--------------+------------+--------------------------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

I have the following query:

SELECT
    table1.id AS table1_id,
    table1.field1,
    table1.date,
    table1.text,
    table2.id AS table2_id,
    table2.some1_id,
    table2.some1_name,
    table2.some2_id,
    table2.some2_name,
    table2.some3_name,
    table2.some4_email,
    table2.some4_name,
    table2.some4_place1_gift,
    table2.some4_place2_gift,
    table2.some4_place3_gift
FROM
    table2
INNER JOIN
    table1 ON table2.table1_id = table1.id
WHERE
    table2.some1_name = 'Some1_Name_Example'
    AND table2.some2_name = 'Some2_Name_Example'
    AND table2.some3_name = 'Some3_Name_Example'
    AND (
        table2.some2_id IN (1, 22, 975, 5981, 6127, 10861)
        OR table2.some1_id IN (2564, 4886, 12514, 12724, 13905, 15491, 16295, 18125, 22162, 20702, 20704, 21503)
        OR (table2.some2_id = 1277 AND table1.date < '2022-03-27 00:00:00')
        OR (table2.some2_id = 1678 AND table1.date < '2023-06-16 00:00:00')
        OR (table2.some2_id = 9519 AND table1.date < '2021-01-05 00:00:00')
        OR (table2.some1_id = 4648 AND table1.date < '2023-06-16 00:00:00')
    )
ORDER BY
    table1.date DESC,
    table2.id DESC
LIMIT 200;

Query time: 1.04 sec. This is EXPLAIN statement for this query:

+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+----------------------------+--------+---------------------------------------------------------------------+
| id | select_type | table        | type   | possible_keys                                                                                       | key             | key_len | ref                  | rows   | Extra                                                               |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+----------------------------+--------+---------------------------------------------------------------------+
|  1 | SIMPLE      | table2       | ref    | table2_table1_id_foreign,some3_name_idx,some2_id_idx,some1_id_idx,some1_name_index,some2_name_index | some3_name_idx  | 768     | const                | 231455 | Using index condition; Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | events       | eq_ref | PRIMARY,table1_date_idx                                                                             | PRIMARY         | 8       | db.table2.table1_id  |      1 | Using where                                                         |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+----------------------------+--------+---------------------------------------------------------------------+

Then I remove ORDER BY table1.date DESC clause (leaving just ORDER BY table2.id DESC).

Query time: 0.12 sec. EXPLAIN statement:

+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+--
| id | select_type | table        | type   | possible_keys                                                                                       | key             | key_len | ref                  | rows   | Extra        |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+--
|  1 | SIMPLE      | table2       | ref    | table2_table1_id_foreign,some3_name_idx,some2_id_idx,some1_id_idx,some1_name_index,some2_name_index | some3_name_idx  | 768     | const                | 231455 | Using where  |
|  1 | SIMPLE      | events       | eq_ref | PRIMARY,table1_date_idx                                                                             | PRIMARY         | 8       | db.table2.table1_id  |      1 | Using where  |
+----+-------------+--------------+--------+-------------------------------------------------------------------------------------------------------------------------------------------------+-----------------+---------+---

Looks like ORDER BY table1.date DESC creates massive overhead in query execution. I am not sure if anything can be done in such case to optimize a query. Any suggestions?

P.S. I know that WHERE clauses look strange, but my main question in ORDER BY optimization.


Solution

  • You have two performance problems with this scenario.

    1. SELECT whole-mess-of-rows ... ORDER BY something LIMIT small-number is a well-known performance antipattern. To satisfy the query the database must fetch a lot of data, sort it in a particular order, then discard all but a tiny fraction of it. It's sometimes possible to use an index to avoid the sorting task.

      This is why leaving out the ORDER BY clause makes your query faster. It just returns the first 200 rows, whichever rows were generated first. But those rows won't be very useful, because their order is unpredictable. You can't reasonably omit ORDER BY.

    2. A covering index will help the WHERE filters on this query. Try this index:

      CREATE INDEX names ON table2
         (some1_name, some2_name, some_3_name, some2_id, some1_id)
      

      It should make things a bit faster.

    3. If you're still strugging with performance you can try a so-called deferred join. First use a subquery to get the id values of the rows you need. Like this.

       SELECT
           table1.id AS table1_id,
           table2.id AS table2_id
       FROM
           table2
       INNER JOIN
           table1 ON table2.table1_id = table1.id
       WHERE
           table2.some1_name = 'Some1_Name_Example'
           AND table2.some2_name = 'Some2_Name_Example'
           AND table2.some3_name = 'Some3_Name_Example'
           AND (
               table2.some2_id IN (1, 22, 975, 5981, 6127, 10861)
               OR table2.some1_id IN (2564, 4886, 12514, 12724, 13905, 15491, 16295, 18125, 22162, 20702, 20704, 21503)
               OR (table2.some2_id = 1277 AND table1.date < '2022-03-27 00:00:00')
               OR (table2.some2_id = 1678 AND table1.date < '2023-06-16 00:00:00')
               OR (table2.some2_id = 9519 AND table1.date < '2021-01-05 00:00:00')
               OR (table2.some1_id = 4648 AND table1.date < '2023-06-16 00:00:00')
           )
       ORDER BY
           table1.date DESC,
           table2.id DESC
       LIMIT 200;
      

      That has the same sort-then-discard problem as your original query, but each row that gets sorted is far far smaller -- just a couple of id values and a `date.

      Then join it to your tables to get the details (I didn't debug this).

       SELECT
           table1.id AS table1_id,
           table1.field1,
           table1.date,
           table1.text,
           table2.id AS table2_id,
           table2.some1_id,
           table2.some1_name,
           table2.some2_id,
           table2.some2_name,
           table2.some3_name,
           table2.some4_email,
           table2.some4_name,
           table2.some4_place1_gift,
           table2.some4_place2_gift,
           table2.some4_place3_gift
           FROM (
               SELECT
                   table1.id AS table1_id,
                   table2.id AS table2_id
               FROM
                   table2
               INNER JOIN
                   table1 ON table2.table1_id = table1.id
               WHERE
                   table2.some1_name = 'Some1_Name_Example'
                   AND table2.some2_name = 'Some2_Name_Example'
                   AND table2.some3_name = 'Some3_Name_Example'
                   AND (
                       table2.some2_id IN (1, 22, 975, 5981, 6127, 10861)
                       OR table2.some1_id IN (2564, 4886, 12514, 12724, 13905, 15491, 16295, 18125, 22162, 20702, 20704, 21503)
                       OR (table2.some2_id = 1277 AND table1.date < '2022-03-27 00:00:00')
                       OR (table2.some2_id = 1678 AND table1.date < '2023-06-16 00:00:00')
                       OR (table2.some2_id = 9519 AND table1.date < '2021-01-05 00:00:00')
                       OR (table2.some1_id = 4648 AND table1.date < '2023-06-16 00:00:00')
                   )
               ORDER BY
                   table1.date DESC,
                   table2.id DESC
               LIMIT 200
            ) ids
       INNER JOIN table1 ON ids.table1_id = table1.id
       INNER JOIN table2 ON ids.table2_id = table2.id AND table2.table1_id = table1.id
       ORDER BY     
          table1.date DESC,
          table2.id DESC