sqlmysqlquery-optimizationunpivot

Slow MySQL query with multiple sub-queries in where clause


I have a MySQL statement that lists a series of products, that belong to different sets, and orders the results as one product for each set.

My products are organized like this.

Set 01 Set 02 Set 03
Product 01 Product 06 Product 11
Product 02 Product 07 Product 12
Product 03 Product 08 Product 13
Product 09 Product 14
Product 15

The results look like this

Set 01 - Product 01
Set 02 - Product 06
Set 03 - Product 11
Set 01 - Product 02
Set 02 - Product 07
Set 03 - Product 12
Set 01 - Product 03
Set 02 - Product 08
Set 03 - Product 13
Set 02 - Product 09
Set 03 - Product 14
Set 03 - Product 15

Here is the statement

SELECT * FROM
    (SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
FROM (
    select ps.product_id, ps.set_id, ps.order FROM
    `product_set` ps JOIN sets s ON ps.set_id = s.id
    ORDER BY ps.set_id desc, ps.order) ps

CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set

    WHERE `product_set`.`set_id` IN 
        (select `set_id` FROM `sets` WHERE `sets`.`status_id` = 50) 

The above query performs well and I don't have any issues with it.

Where I get into performance issues is when I add a boat ton of additional sub-queries. I know there is a bunch of code stink in here. SQL is not my primary skill set. I have read that using JOINs is better than sub-queries, but I have no idea how to start implementing them. The query takes around 1.8 seconds and I'd like to shave that down as much as possible.


SELECT * FROM
    (SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
FROM (
    select `ps`.`product_id`, `ps`.`set_id`, `ps`.`order` FROM
    `product_set` ps JOIN `sets` s ON `ps`.`set_id` = `s`.`id`
    ORDER BY ps.set_id desc, ps.order) ps

CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set

    WHERE `product_set`.`set_id` IN 
        (select `set_id` FROM `sets` WHERE `sets`.`status_id` = 50) 
    
    AND 
    `product_set`.`product_id` IN 
        (select `product_id` FROM `product_state` 
                                WHERE `product_state`.`state_id` = 1 
                                  AND `product_state`.`status_id` = 17)

    AND 
    `product_set`.`product_id` IN 
        (select `product_id` FROM `customer_product` WHERE `customer_product`.`customer_id` IN
            (select `id` FROM `customers` WHERE `type_id` IN (1, 5))
        ) 
    AND `product_set`.`product_id` IN 
        (select `id` FROM `products` WHERE `products`.`camera_id` IN (1, 2)) 
    
    AND `product_set`.`product_id` IN 
        (select `product_id` FROM `customer_product` WHERE `customer_id` IN (1, 17, 23, 32)) 
    
    AND `product_set`.`product_id` IN
        (select `product_id` FROM `product_categories` WHERE `category_id` IN (124, 127, 116, 117))
    
    AND `product_set`.`product_id` IN 
        (select `product_id` FROM `product_categories` WHERE  `category_id` IN (111))
    
    ORDER BY `product_set`.`rn` asc , `product_set`.`set_id` desc;

UPDATE

In case anyone comes across this problem in the future. This is how I optimized the query thanks to @Barmar's recommendation. If anyone has any additional optimization ideas, I am all ears.

FWIW The query went from 1.8 seconds to 208ms.

SELECT * FROM
    (SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
     FROM (
              select `ps`.`product_id`, `ps`.`set_id`, `ps`.`order` FROM
                  `product_set` ps JOIN `sets` s ON `ps`.`set_id` = `s`.`id`
              ORDER BY ps.set_id desc, ps.order) ps
CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set 

JOIN 
    (SELECT DISTINCT `id` FROM `sets` WHERE `sets`.`status_id` = 50) 
        AS q1 ON q1.id = `product_set`.`set_id`
JOIN
    (SELECT DISTINCT `product_id` FROM `product_state` 
     WHERE `product_state`.`state_id` = 1 
     AND `product_state`.`status_id` = 17) 
        AS q2 ON q2.product_id = product_set.product_id
JOIN 
    (SELECT DISTINCT `id` FROM `products` WHERE `products`.`camera_id` IN (1, 2)) 
        AS q3 ON q3.id = product_set.product_id 
JOIN 
    (SELECT DISTINCT `product_id` FROM `customer_product` WHERE `customer_id` IN (1, 17, 23, 32))
        AS q4 ON q4.product_id = product_set.product_id 
JOIN 
    (SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (124, 127, 116, 117))
        AS q5 ON q5.product_id = product_set.product_id
JOIN 
    (SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (111))
        AS q6 ON q6.product_id = product_set.product_id
JOIN 
    (SELECT DISTINCT `product_id` FROM `customer_product` JOIN 
        (SELECT DISTINCT `id` FROM `customers` WHERE `type_id` IN (1, 5)) 
            AS q8 ON q8.id = `customer_product`.`customer_id`) 
        AS q7 ON q7.product_id = product_set.product_id
    
ORDER BY `product_set`.`rn` asc , `product_set`.`set_id` desc; 

Update #2

I have created a gist with a sqldump with some fake data.

MySQL version 8.0.36

I have also updated the query a bit to conform to the new data, see below.

SELECT * FROM
    (SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
     FROM (
              select `ps`.`product_id`, `ps`.`set_id`, `ps`.`order` FROM
                  `product_set` ps JOIN `sets` s ON `ps`.`set_id` = `s`.`id`
              ORDER BY ps.set_id desc, ps.order) ps
CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set 

JOIN 
    (SELECT DISTINCT `id` FROM `sets` WHERE `sets`.`status_id` = 50) 
        AS q1 ON q1.id = `product_set`.`set_id`
JOIN
    (SELECT DISTINCT `product_id` FROM `product_state` 
     WHERE `product_state`.`state_id` = 1 
     AND `product_state`.`status_id` = 17) 
        AS q2 ON q2.product_id = product_set.product_id
JOIN 
    (SELECT DISTINCT `id` FROM `products` WHERE `products`.`camera_id` IN (1, 2)) 
        AS q3 ON q3.id = product_set.product_id 
JOIN 
    (SELECT DISTINCT `product_id` FROM `customer_product` WHERE `customer_id` IN (1, 5, 6, 8))
        AS q4 ON q4.product_id = product_set.product_id 
JOIN 
    (SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (3, 4, 6, 7))
        AS q5 ON q5.product_id = product_set.product_id
JOIN 
    (SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (11))
        AS q6 ON q6.product_id = product_set.product_id
JOIN 
    (SELECT DISTINCT `product_id` FROM `customer_product` JOIN 
        (SELECT DISTINCT `id` FROM `customers` WHERE `type_id` IN (1, 5)) 
            AS q8 ON q8.id = `customer_product`.`customer_id`) 
        AS q7 ON q7.product_id = product_set.product_id
ORDER BY `product_set`.`rn` asc , `product_set`.`set_id` desc; 

Solution

  • You have two junction tables, customer_product and product_categories. Both have surrogate primary keys which add nothing. They also lack composite indices for both directions. In most cases junction tables should be indexed in both directions.

    Taking customer_product as an example:

    CREATE TABLE `customer_product` (
      `id` int NOT NULL AUTO_INCREMENT,
      `customer_id` int unsigned NOT NULL,
      `product_id` int unsigned NOT NULL,
      PRIMARY KEY (`id`),
      UNIQUE KEY `customer_id` (`customer_id`,`product_id`),
      KEY `customer_id_2` (`customer_id`),
      KEY `product_id` (`product_id`)
    )
    
    1. customer_id_2 is redundant, as customer_id is the first column in your customer_id composite index.
    2. The single column index on product_id is of little use, as to get the related customer_ids would still require an additional read from the clustered index.

    A better structure would be:

    CREATE TABLE `customer_product` (
      `customer_id` int unsigned NOT NULL,
      `product_id` int unsigned NOT NULL,
      PRIMARY KEY (`customer_id`, `product_id`),
      KEY `idx_product_customer` (`product_id`, `customer_id`)
    )
    

    There's currently no index on sets.status_id. If this criterion is selective enough you should add the index.

    There are duplicate indices on product_set.product_id and product_set.set_id.

    I am not sure how this query will compare with the ones you have already tried, but it is worth trying:

    SELECT *
    FROM (
        SELECT `ps`.`product_id`, `ps`.`set_id`, `ps`.`order`,
            ROW_NUMBER() OVER (PARTITION BY `ps`.`set_id` ORDER BY `ps`.`set_id` DESC, `ps`.`order` ASC) AS rn
        FROM `product_set` `ps`
        JOIN `sets` `s` ON `ps`.`set_id` = `s`.`id`
        WHERE `s`.`status_id` = 50 -- moved in from q1 as affects whole set
    ) AS `product_set`
    
    WHERE EXISTS (
        SELECT 1 FROM `product_state` `ps`
        WHERE `ps`.`state_id` = 1 
        AND `ps`.`status_id` = 17
        AND `ps`.`product_id` = `product_set`.`product_id`
    )
    AND EXISTS (
        SELECT 1 FROM `products` `p`
        WHERE `p`.`camera_id` IN (1, 2)
        AND `p`.`id` = `product_set`.`product_id`
    )
    AND EXISTS (
        -- q4, q7 & q8 combined
        SELECT 1 FROM `customer_product` `cp`
        JOIN `customers` `c` ON `cp`.`customer_id` = `c`.`id`
        WHERE `cp`.`product_id` = `product_set`.`product_id`
        AND `c`.`type_id` IN (1, 5)
        AND `cp`.`customer_id` IN (1, 5, 6, 8)
    )
    AND EXISTS (
        SELECT 1 FROM `product_categories` `pc`
        WHERE `pc`.`category_id` IN (3, 4, 6, 7)
        AND `pc`.`product_id` = `product_set`.`product_id`
    )
    AND EXISTS (
        SELECT 1 FROM `product_categories` `pc`
        WHERE `pc`.`category_id` IN (11)
        AND `pc`.`product_id` = `product_set`.`product_id`
    )
    ORDER BY `product_set`.`rn` ASC, `product_set`.`set_id` DESC;
    

    With the "fake data" provided, the query time went from 92ms to 5ms, but I suspect this data is not a good representation of your real dataset.