I am trying to subtract query results from two columns.
Table:
id | word1 | lang1 | word2 | lang2 |
----+-----------+-------+-----------+-------+
1 | car | 1 | car | 15 |
2 | table | 1 | table | 15 |
3 | Chair | 1 | cahair | 13 |
4 | CDplayer | 15 | CDplayer | 1 |
5 | car | 1 | car | 13 |
I would like to get all the words in word1 in language 1 that are not translated yet into language 12. So in this case it would be Chair
There are 3 million rows in the table and the following query takes 1 minute to run:
SELECT DISTINCT word1
FROM `translations`
WHERE lang1 = 1
AND lang2 != 15
AND NOT IN (SELECT word1 FROM `translations` WHERE lang2 == 15)
LIMIT 10
Doing a select on both rows separately is very fast 0.006s and then I could use array_diff()
in PHP to subtract them from one another, but there probably is a simpler way to do it directly in MySQL.
SELECT
origin.word1
FROM
( SELECT DISTINCT word1
FROM tableX
WHERE lang1 = 1
) AS origin
WHERE
NOT EXISTS
( SELECT *
FROM tableX AS trans
WHERE trans.lang1 = 1
AND trans.lang2 = 15
AND trans.word1 = origin.word1
) ;
I would add an index on (lang1, word1)
and an index on (lang1, lang2, word1)
before running these queries.
You could also try this variation (and check both Explain plans):
SELECT DISTINCT
word1
FROM
tableX AS origin
WHERE
lang1 = 1
AND
NOT EXISTS
( SELECT *
FROM tableX AS trans
WHERE trans.lang1 = 1
AND trans.lang2 = 15
AND trans.word1 = origin.word1
) ;