I have a MySQL statement that lists a series of products, that belong to different sets, and orders the results as one product for each set.
My products are organized like this.
Set 01 | Set 02 | Set 03 |
---|---|---|
Product 01 | Product 06 | Product 11 |
Product 02 | Product 07 | Product 12 |
Product 03 | Product 08 | Product 13 |
Product 09 | Product 14 | |
Product 15 |
The results look like this
Set 01 - Product 01
Set 02 - Product 06
Set 03 - Product 11
Set 01 - Product 02
Set 02 - Product 07
Set 03 - Product 12
Set 01 - Product 03
Set 02 - Product 08
Set 03 - Product 13
Set 02 - Product 09
Set 03 - Product 14
Set 03 - Product 15
Here is the statement
SELECT * FROM
(SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
FROM (
select ps.product_id, ps.set_id, ps.order FROM
`product_set` ps JOIN sets s ON ps.set_id = s.id
ORDER BY ps.set_id desc, ps.order) ps
CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set
WHERE `product_set`.`set_id` IN
(select `set_id` FROM `sets` WHERE `sets`.`status_id` = 50)
The above query performs well and I don't have any issues with it.
Where I get into performance issues is when I add a boat ton of additional sub-queries. I know there is a bunch of code stink in here. SQL is not my primary skill set. I have read that using JOINs is better than sub-queries, but I have no idea how to start implementing them. The query takes around 1.8 seconds and I'd like to shave that down as much as possible.
SELECT * FROM
(SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
FROM (
select `ps`.`product_id`, `ps`.`set_id`, `ps`.`order` FROM
`product_set` ps JOIN `sets` s ON `ps`.`set_id` = `s`.`id`
ORDER BY ps.set_id desc, ps.order) ps
CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set
WHERE `product_set`.`set_id` IN
(select `set_id` FROM `sets` WHERE `sets`.`status_id` = 50)
AND
`product_set`.`product_id` IN
(select `product_id` FROM `product_state`
WHERE `product_state`.`state_id` = 1
AND `product_state`.`status_id` = 17)
AND
`product_set`.`product_id` IN
(select `product_id` FROM `customer_product` WHERE `customer_product`.`customer_id` IN
(select `id` FROM `customers` WHERE `type_id` IN (1, 5))
)
AND `product_set`.`product_id` IN
(select `id` FROM `products` WHERE `products`.`camera_id` IN (1, 2))
AND `product_set`.`product_id` IN
(select `product_id` FROM `customer_product` WHERE `customer_id` IN (1, 17, 23, 32))
AND `product_set`.`product_id` IN
(select `product_id` FROM `product_categories` WHERE `category_id` IN (124, 127, 116, 117))
AND `product_set`.`product_id` IN
(select `product_id` FROM `product_categories` WHERE `category_id` IN (111))
ORDER BY `product_set`.`rn` asc , `product_set`.`set_id` desc;
In case anyone comes across this problem in the future. This is how I optimized the query thanks to @Barmar's recommendation. If anyone has any additional optimization ideas, I am all ears.
FWIW The query went from 1.8 seconds to 208ms.
SELECT * FROM
(SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
FROM (
select `ps`.`product_id`, `ps`.`set_id`, `ps`.`order` FROM
`product_set` ps JOIN `sets` s ON `ps`.`set_id` = `s`.`id`
ORDER BY ps.set_id desc, ps.order) ps
CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set
JOIN
(SELECT DISTINCT `id` FROM `sets` WHERE `sets`.`status_id` = 50)
AS q1 ON q1.id = `product_set`.`set_id`
JOIN
(SELECT DISTINCT `product_id` FROM `product_state`
WHERE `product_state`.`state_id` = 1
AND `product_state`.`status_id` = 17)
AS q2 ON q2.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `id` FROM `products` WHERE `products`.`camera_id` IN (1, 2))
AS q3 ON q3.id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `customer_product` WHERE `customer_id` IN (1, 17, 23, 32))
AS q4 ON q4.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (124, 127, 116, 117))
AS q5 ON q5.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (111))
AS q6 ON q6.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `customer_product` JOIN
(SELECT DISTINCT `id` FROM `customers` WHERE `type_id` IN (1, 5))
AS q8 ON q8.id = `customer_product`.`customer_id`)
AS q7 ON q7.product_id = product_set.product_id
ORDER BY `product_set`.`rn` asc , `product_set`.`set_id` desc;
I have created a gist with a sqldump with some fake data.
MySQL version 8.0.36
I have also updated the query a bit to conform to the new data, see below.
SELECT * FROM
(SELECT *, (@row_num := if(@set_id = set_id, @row_num + 1, if(@set_id := set_id, 1, 1) ) ) AS rn
FROM (
select `ps`.`product_id`, `ps`.`set_id`, `ps`.`order` FROM
`product_set` ps JOIN `sets` s ON `ps`.`set_id` = `s`.`id`
ORDER BY ps.set_id desc, ps.order) ps
CROSS JOIN (SELECT @set_id := -1, @row_num:= 0) params ) AS product_set
JOIN
(SELECT DISTINCT `id` FROM `sets` WHERE `sets`.`status_id` = 50)
AS q1 ON q1.id = `product_set`.`set_id`
JOIN
(SELECT DISTINCT `product_id` FROM `product_state`
WHERE `product_state`.`state_id` = 1
AND `product_state`.`status_id` = 17)
AS q2 ON q2.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `id` FROM `products` WHERE `products`.`camera_id` IN (1, 2))
AS q3 ON q3.id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `customer_product` WHERE `customer_id` IN (1, 5, 6, 8))
AS q4 ON q4.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (3, 4, 6, 7))
AS q5 ON q5.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `product_categories` WHERE `category_id` IN (11))
AS q6 ON q6.product_id = product_set.product_id
JOIN
(SELECT DISTINCT `product_id` FROM `customer_product` JOIN
(SELECT DISTINCT `id` FROM `customers` WHERE `type_id` IN (1, 5))
AS q8 ON q8.id = `customer_product`.`customer_id`)
AS q7 ON q7.product_id = product_set.product_id
ORDER BY `product_set`.`rn` asc , `product_set`.`set_id` desc;
You have two junction tables, customer_product
and product_categories
. Both have surrogate primary keys which add nothing. They also lack composite indices for both directions. In most cases junction tables should be indexed in both directions.
Taking customer_product
as an example:
CREATE TABLE `customer_product` (
`id` int NOT NULL AUTO_INCREMENT,
`customer_id` int unsigned NOT NULL,
`product_id` int unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `customer_id` (`customer_id`,`product_id`),
KEY `customer_id_2` (`customer_id`),
KEY `product_id` (`product_id`)
)
customer_id_2
is redundant, as customer_id is the first column in your customer_id
composite index.product_id
is of little use, as to get the related customer_ids would still require an additional read from the clustered index.A better structure would be:
CREATE TABLE `customer_product` (
`customer_id` int unsigned NOT NULL,
`product_id` int unsigned NOT NULL,
PRIMARY KEY (`customer_id`, `product_id`),
KEY `idx_product_customer` (`product_id`, `customer_id`)
)
There's currently no index on sets.status_id
. If this criterion is selective enough you should add the index.
There are duplicate indices on product_set.product_id
and product_set.set_id
.
I am not sure how this query will compare with the ones you have already tried, but it is worth trying:
SELECT *
FROM (
SELECT `ps`.`product_id`, `ps`.`set_id`, `ps`.`order`,
ROW_NUMBER() OVER (PARTITION BY `ps`.`set_id` ORDER BY `ps`.`set_id` DESC, `ps`.`order` ASC) AS rn
FROM `product_set` `ps`
JOIN `sets` `s` ON `ps`.`set_id` = `s`.`id`
WHERE `s`.`status_id` = 50 -- moved in from q1 as affects whole set
) AS `product_set`
WHERE EXISTS (
SELECT 1 FROM `product_state` `ps`
WHERE `ps`.`state_id` = 1
AND `ps`.`status_id` = 17
AND `ps`.`product_id` = `product_set`.`product_id`
)
AND EXISTS (
SELECT 1 FROM `products` `p`
WHERE `p`.`camera_id` IN (1, 2)
AND `p`.`id` = `product_set`.`product_id`
)
AND EXISTS (
-- q4, q7 & q8 combined
SELECT 1 FROM `customer_product` `cp`
JOIN `customers` `c` ON `cp`.`customer_id` = `c`.`id`
WHERE `cp`.`product_id` = `product_set`.`product_id`
AND `c`.`type_id` IN (1, 5)
AND `cp`.`customer_id` IN (1, 5, 6, 8)
)
AND EXISTS (
SELECT 1 FROM `product_categories` `pc`
WHERE `pc`.`category_id` IN (3, 4, 6, 7)
AND `pc`.`product_id` = `product_set`.`product_id`
)
AND EXISTS (
SELECT 1 FROM `product_categories` `pc`
WHERE `pc`.`category_id` IN (11)
AND `pc`.`product_id` = `product_set`.`product_id`
)
ORDER BY `product_set`.`rn` ASC, `product_set`.`set_id` DESC;
With the "fake data" provided, the query time went from 92ms to 5ms, but I suspect this data is not a good representation of your real dataset.