sqlmysqljoinsql-execution-plan

Nested loop inner join with index lookup and filter is slow


I have this query I'm running in MySQL:

SELECT
    count(*)
FROM
    library AS l
    JOIN plays AS p ON p.user_id = l.user_id AND
    l.path = p.path
WHERE
    l.user_id = 20977 AND
    p.time >= '2022-10-17';

When EXPLAIN ANALYZE is run:

| -> Aggregate: count(0)  (cost=1085653.55 rows=6692) (actual time=12576.265..12576.266 rows=1 loops=1)
    -> Nested loop inner join  (cost=1084984.37 rows=6692) (actual time=40.604..12566.569 rows=56757 loops=1)
        -> Index lookup on l using user_id_2 (user_id=20977)  (cost=116747.95 rows=106784) (actual time=13.153..3783.204 rows=59631 loops=1)
        -> Filter: ((p.user_id = 20977) and (p.`time` >= TIMESTAMP'2022-10-17 00:00:00'))  (cost=8.24 rows=0) (actual time=0.135..0.147 rows=1 loops=59631)
            -> Index lookup on p using path (path=l.`path`)  (cost=8.24 rows=8) (actual time=0.090..0.146 rows=1 loops=59631)
 |
1 row in set (12.76 sec)

I obviously want to make this faster!

Table definitions

CREATE TABLE `library` (
  `user_id` int NOT NULL,
  `name` varchar(20) COLLATE utf8mb4_general_ci NOT NULL,
  `path` varchar(512) COLLATE utf8mb4_general_ci NOT NULL,
  `title` varchar(512) COLLATE utf8mb4_general_ci NOT NULL,
  `created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `edited` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  `db_id` int NOT NULL,
  `tag` varchar(64) COLLATE utf8mb4_general_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

CREATE TABLE `plays` (
  `user_id` int DEFAULT NULL,
  `name` varchar(20) CHARACTER SET utf8 DEFAULT NULL,
  `path` varchar(512) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `time` datetime DEFAULT CURRENT_TIMESTAMP,
  `play_id` int NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;

ALTER TABLE `library`
  ADD PRIMARY KEY (`db_id`),
  ADD KEY `user_id_loc` (`user_id`,`name`,`path`(191)),
  ADD KEY `edited` (`edited`),
  ADD KEY `created` (`created`),
  ADD KEY `title` (`title`),
  ADD KEY `user_id` (`user_id`),
  ADD INDEX `user_id_by_title` (`user_id`, `title`);

ALTER TABLE `plays`
  ADD PRIMARY KEY (`play_id`),
  ADD KEY `user_id` (`user_id`,`name`,`path`(255)),
  ADD KEY `user_id_2` (`user_id`,`name`),
  ADD KEY `time` (`time`),
  ADD KEY `path` (`path`),
  ADD KEY `user_id_3` (`user_id`,`name`,`path`,`time`);

It looks like the killer is the looping over 59631 rows.

Would an index on (user_id, time) make it faster?

Interestingly, the user_id_2 index is actually an index on (user_id, title), rather than the plain user_id index. I'm not sure why user_id_2 is chosen given title isn't used in the query.


Solution

  • I tested your query and tried a different index in each table.

    ALTER TABLE library ADD KEY bk1 (user_id, path); 
    
    ALTER TABLE plays ADD KEY bk2 (user_id, path, time); 
    
    EXPLAIN SELECT
        COUNT(*)
    FROM
        library AS l USE INDEX (bk1)
        JOIN plays AS p USE INDEX (bk2)
          ON p.user_id = l.user_id 
          AND l.path = p.path
    WHERE
        l.user_id = 20977 
        AND p.time >= '2022-10-17';
    
    +----+-------------+-------+------------+------+---------------+------+---------+-------------------+------+----------+--------------------------+
    | id | select_type | table | partitions | type | possible_keys | key  | key_len | ref               | rows | filtered | Extra                    |
    +----+-------------+-------+------------+------+---------------+------+---------+-------------------+------+----------+--------------------------+
    |  1 | SIMPLE      | l     | NULL       | ref  | bk1           | bk1  | 4       | const             |    1 |   100.00 | Using index              |
    |  1 | SIMPLE      | p     | NULL       | ref  | bk2           | bk2  | 2056    | const,test.l.path |    1 |   100.00 | Using where; Using index |
    +----+-------------+-------+------------+------+---------------+------+---------+-------------------+------+----------+--------------------------+
    

    The note "Using index" in each row of the EXPLAIN report shows that it's getting the benefit of a covering index for both tables.

    I didn't use prefix index syntax, because that would spoil the covering index optimization. It's not necessary to use prefix indexes for this example on modern MySQL versions because they default to an InnoDB row format that supports 3072-byte indexes instead of the old MySQL that only supported 768-byte indexes by default.

    In my test, I had zero rows in the tables I tested, so I had to use an index hint to make the optimizer choose my new indexes. In a table with a substantial number of rows, the optimizer might choose the new indexes on its own.