mysqlregexmatch-against

MySQL MATCH() AGAINST() vs. REGEXP for matching whole words


I am trying to optimise searching in my dictionary (109,000 entries, MyISAM, FULLTEXT), and I am now comparing the performance of MATCH() AGAINST() with that of REGEXP '[[:<:]]keyword1[[:>:]]' AND table.field REGEXP '[[:<:]]keyword2[[:>:]]'.

Using two keywords, I get (inside PhpMyAdmin) 0.0000 seconds or 0.0010 seconds for the MATCH() AGAINST() query vs. 0.1962 seconds or 0.2190 seconds for the regex query. Is speed the only indicator that matters here? Which query should I prefer (both appear to yield the exact same results)? Is it the obvious – the faster one?

Here are the full queries:

SELECT * FROM asphodel_dictionary_unsorted 
JOIN asphodel_dictionary_themes ON asphodel_dictionary_unsorted.theme_id = asphodel_dictionary_themes.theme_id 
LEFT JOIN asphodel_dictionary_definitions ON asphodel_dictionary_unsorted.term_id = asphodel_dictionary_definitions.term_id 
WHERE MATCH (asphodel_dictionary_unsorted.english) 
AGAINST ('+boiler +pump' IN BOOLEAN MODE)

and

SELECT * FROM asphodel_dictionary_unsorted 
JOIN asphodel_dictionary_themes ON asphodel_dictionary_unsorted.theme_id = asphodel_dictionary_themes.theme_id 
LEFT JOIN asphodel_dictionary_definitions ON asphodel_dictionary_unsorted.term_id = asphodel_dictionary_definitions.term_id 
WHERE asphodel_dictionary_unsorted.english REGEXP '[[:<:]]boiler[[:>:]]' 
AND asphodel_dictionary_unsorted.english REGEXP '[[:<:]]pump[[:>:]]' 
ORDER BY asphodel_dictionary_unsorted.theme_id, asphodel_dictionary_unsorted.english

Solution

  • The MATCH/AGAINST solution uses a FULLTEXT index, and it searches the index pretty efficiently.

    The REGEXP solution cannot use an index. It always forces a table-scan and tests every row with the regular expression. As your table grows, it will take longer to do REGEXP queries in linear proportion to the number of rows.

    I did a presentation Full Text Search Throwdown some years ago, where I compared fulltext-indexed approaches against LIKE and REGEXP. With sample data of 7.4 million rows, the REGEXP took 7 minutes, 57 seconds, whereas searching an InnoDB FULLTEXT index in boolean mode took 350 milliseconds — the MATCH/AGAINST query was 1,363 times faster.

    The difference grows even larger the more rows you have.