mysqlperformanceintersectionquery-tuning

Mysql intersection query performance


I am quite new to mysql. I have 2 identical mysql tables which have 50K rows (70 columns) each. Those tables are updated everyday by a datafeed. I need to execute some nested queries like intersections / substractions etc.

One of the queries I try to use is as below. But it doesn't work properly. Either it takes 5 min. to 10 min. (through terminal) or it does not respond back.

SELECT * 
FROM   table1 
WHERE  table1.sku IN (SELECT t1.sku 
                      FROM   ((SELECT DISTINCT sku 
                               FROM   table2) 
                              UNION ALL 
                              (SELECT DISTINCT sku 
                               FROM   table1)) AS t1 
                      GROUP  BY sku 
                      HAVING Count(*) >= 2) 

How can I make it work faster/properly? How should I configure the tables/columns (index, primary key etc.) Or do I need to make any tuning on the mysql server?

I tried several things. I created indexes on the 'sku' which are varchar(75) columns. My database server runs on a 1 CoreProcessor (Digital Ocean) server with 512MB Memory.

--- query with 'EXPLAIN'

+----+--------------------+-----------------------+-------+---------------+---------+---------+------+-------+---------------------------------+
| id | select_type        | table                 | type  | possible_keys | key     | key_len | ref  | rows  | Extra                           |
+----+--------------------+-----------------------+-------+---------------+---------+---------+------+-------+---------------------------------+
|  1 | PRIMARY            | table1                | ALL   | NULL          | NULL    | NULL    | NULL | 30260 | Using where                     |
|  2 | DEPENDENT SUBQUERY | <derived3>            | ALL   | NULL          | NULL    | NULL    | NULL | 65677 | Using temporary; Using filesort |
|  3 | DERIVED            | table2                | range | NULL          | sku_idx | 227     | NULL | 31016 | Using index for group-by        |
|  4 | UNION              | table1                | range | NULL          | sku     | 227     | NULL | 30261 | Using index for group-by        |
| NULL | UNION RESULT     | <union3,4>            | ALL   | NULL          | NULL    | NULL    | NULL |  NULL |                                 |
+----+--------------------+-----------------------+-------+---------------+---------+---------+------+-------+---------------------------------+

Solution

  • If I understand this particular query correctly, you are trying to display all the records from table1 which have a corresponding sku in table2.

    That can be achieved by a much simpler query:

    SELECT * 
    FROM   table1 
    WHERE  table1.sku IN (SELECT DISTINCT table2.sku FROM table2 )
    GROUP BY table1.sku
    

    Or, with joins:

    SELECT table1.* 
    FROM   table1 
    INNER JOIN table2 ON table1.sku = table2.sku
    GROUP BY table1.sku
    

    This should work in an instant if you have indexes on table1.sku and table2.sku