mysqldifference

Subtracting mysql query results


I am trying to subtract query results from two columns.

Table:

 id |   word1   | lang1 |   word2   | lang2 |
----+-----------+-------+-----------+-------+
  1 | car       |   1   | car       |  15   |
  2 | table     |   1   | table     |  15   |
  3 | Chair     |   1   | cahair    |  13   |
  4 | CDplayer  |  15   | CDplayer  |   1   |
  5 | car       |   1   | car       |  13   |

I would like to get all the words in word1 in language 1 that are not translated yet into language 12. So in this case it would be Chair

There are 3 million rows in the table and the following query takes 1 minute to run:

SELECT DISTINCT word1 
FROM `translations` 
WHERE lang1 = 1 
  AND lang2 != 15 
  AND NOT IN (SELECT word1 FROM `translations` WHERE lang2 == 15) 
LIMIT 10

Doing a select on both rows separately is very fast 0.006s and then I could use array_diff() in PHP to subtract them from one another, but there probably is a simpler way to do it directly in MySQL.


Solution

  • SELECT 
        origin.word1
    FROM 
        ( SELECT DISTINCT word1
          FROM tableX 
          WHERE lang1 = 1
        ) AS origin
    WHERE 
        NOT EXISTS
          ( SELECT *
            FROM tableX AS trans
            WHERE trans.lang1 =  1
              AND trans.lang2 = 15
              AND trans.word1 = origin.word1
          ) ;
    

    I would add an index on (lang1, word1) and an index on (lang1, lang2, word1) before running these queries.

    You could also try this variation (and check both Explain plans):

    SELECT DISTINCT
          word1
    FROM 
          tableX AS origin
    WHERE 
          lang1 = 1
      AND 
          NOT EXISTS
          ( SELECT *
            FROM tableX AS trans
            WHERE trans.lang1 =  1
              AND trans.lang2 = 15
              AND trans.word1 = origin.word1
          ) ;