mysqlsemi-join

MYSQL query on products within variant price range without duplicates


I am trying to query my products table such that my query returns a product set within a certain variant price range -- products have many variants and variants belong to products. I have the following query which is working but it is very slow -- the query takes 1.66 seconds.

SELECT colortags.tag, products.*, (SELECT variants.price FROM variants WHERE variants.product_id = products.product_id LIMIT 1) price, collects.position 
FROM products
INNER JOIN wp_wps_collects collects 
ON products.product_id = collects.product_id 
AND collects.collection_id = 123456788
INNER JOIN wp_wps_tags colortags 
ON colortags.product_id = collects.product_id 
AND (colortags.tag = 'black' OR colortags.tag = 'nav') 
INNER JOIN wp_wps_tags styletags 
ON styletags.product_id = collects.product_id 
AND (styletags.tag = 'purses')
WHERE (SELECT variants.price FROM variants WHERE variants.product_id = products.product_id AND variants.price >= 200 AND variants.price < 300 LIMIT 1) > 0
AND (SELECT variants.price FROM variants WHERE variants.product_id = products.product_id AND variants.price >= 200 AND variants.price < 300 LIMIT 1) <> 0

In the WHERE statement, I've tried just checking the price value created by my subquery in my SELECT statement but MYSQL tells me that price is undefined. Here's the SQL:

SELECT colortags.tag, products.*, (SELECT variants.price FROM wp_wps_variants variants WHERE variants.product_id = products.product_id LIMIT 1) price, collects.position 
FROM wp_wps_products products
INNER JOIN wp_wps_collects collects 
ON products.product_id = collects.product_id 
AND collects.collection_id = 98515058801
INNER JOIN wp_wps_tags colortags 
ON colortags.product_id = collects.product_id 
AND (colortags.tag = 'color:generic-black' OR colortags.tag = 'color:navy') 
INNER JOIN wp_wps_tags styletags 
ON styletags.product_id = collects.product_id 
AND (styletags.tag = 'style:totes')
WHERE price >= 200 
AND price < 300

This is the error message I get from the immediately above SQL:

Unknown column 'price' in 'where clause'

I've also tried to use a series of joins on the variants table but then I get duplicate products returned. Here's the SQL

SELECT colortags.tag, products.*, (SELECT variants.price FROM variants WHERE variants.product_id = products.product_id LIMIT 1) price, collects.position 
FROM products
INNER JOIN collects 
ON products.product_id = collects.product_id 
AND collects.collection_id = 1234566
INNER JOIN colortags 
ON colortags.product_id = collects.product_id 
AND (colortags.tag = 'black' OR colortags.tag = 'navy') 
INNER JOIN styletags 
ON styletags.product_id = collects.product_id 
AND (styletags.tag = 'purses')
INNER JOIN variants
ON variants.product_id = products.product_id
AND variants.price >= 200 
AND variants.price < 300
AND variants.price > 0
AND variants.price <> 0

What I would like is the query results obtained from the first SQL statement without the performance hit. I'd like to think that I can write SQL with an INNER JOIN on the variants table to filter out variants of the incorrect price without getting duplicate products but I cannot identify how.

Does anyone know how I can write SQL here that would allow me to query products within a certain variant price range without getting duplicate products back and without a serious performance hit?


Solution

  • You're doing a semi-join, which means you want to do a join but you only want one row of result, even if there are multiple matches.

    One way you can do this is to join to a derived table, which uses GROUP BY in a subquery to reduce the variants to one row per product_id.

    SELECT colortags.tag, products.*, prices.price, collects.position 
    FROM products
    INNER JOIN wp_wps_collects collects 
      ON products.product_id = collects.product_id 
      AND collects.collection_id = 123456788
    INNER JOIN wp_wps_tags colortags 
      ON colortags.product_id = collects.product_id 
      AND (colortags.tag = 'black' OR colortags.tag = 'nav') 
    INNER JOIN wp_wps_tags styletags 
      ON styletags.product_id = collects.product_id 
      AND (styletags.tag = 'purses')
    INNER JOIN (
      SELECT v.product_id, MIN(v.price) AS price FROM variants v
      INNER JOIN wp_wps_collects c ON v.product_id = c.product_id
      WHERE c.collection_id = 123456788
      GROUP BY product_id ORDER BY NULL
    ) prices
      ON products.product_id = prices.product_id
    

    I assumed you'd be interested in the minimum price per product. You could alternatively use MAX() in the derived table.

    I had to duplicate the join to wp_wps_collects so the derived table would be limited to products relevant to the collection you're querying. Otherwise if you have millions of products, the derived table would become so large that it would create a different performance problem.