mysqlsqloptimizationdatabase-scan

Mysql optimize to avoid table scan


Please consider the following table:

_____________________
|   sentence_word   |
|---------|---------|
| sent_id | word_id |
|---------|---------|
| 1       | 1       |
| 1       | 2       |
| ...     | ...     |
| 2       | 4       |
| 2       | 1       |
| ...     | ...     |

With this table structure I want to store the words of sentences. Now I want to find out which words are together with a specific word in a sentence. The result should look like this:

_____________________
| word_id | counted |
|---------|---------|
| 5       | 1000    |
| 7       | 800     |
| 3       | 600     |
| 1       | 400     |
| 2       | 100     |
| ...     | ...     |

The query Looks like the following:

SELECT
    word_id,
    COUNT(*) AS counted
FROM sentence_word
WHERE sentence_word.sent_id IN (SELECT
    sent_id
    FROM sentence_word
    WHERE word_id = [desired word]
)
AND word_id != [desired word]
GROUP BY word_id
ORDER BY counted DESC;

The query is working as it should but it always scans the full table. I created an index for sent_id and word_id. Do you have any ideas to optimize it that it doesn't Need to scan the full table all the time?


Solution

  • You could try a self join like this:

    SELECT COUNT(DISTINCT sw1.word_id)
    FROM sentence_word sw1
    JOIN sentence_word sw2 ON (
        sw1.sent_id = sw2.sent_id
        AND sw2.word_id = [your word id]
    )
    WHERE sw1.word_id != [your word id]
    

    or perhaps even better

    SELECT COUNT(DISTINCT sw1.word_id)
    FROM sentence_word sw1
    JOIN sentence_word sw2 ON (
        sw1.sent_id = sw2.sent_id
        AND sw2.word_id = [your word id]
        AND sw2.word_id != sw1.word_id
    )