mysqlsqlindexingquery-optimizationfulltext-index

SQL Match Against Fulltext Index is slow when searching common terms


Simple PHP search page with three fields querying a MySQL database with millions of rows. The data consists of profiles, mostly USA.

Fulltext indexing increases speed by a factor of 9.6 when the search term is rare (eg "Canada").

Fulltext indexing slows speed by a factor of 3.6 when the search term is common (eg "United States").

Is there some way to enjoy the benefits of both technique? Maybe searching only a small portion of the table and constructing a "MATCH AGAINST" query if the term is rare (found in less than half of the records) and a standard "LIKE" query if the term is common? That sounds like a convoluted solution. I want a better way.

Here is my SQL and search times:

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%canada%' 

1.14 seconds (slow)

    SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +canada' IN BOOLEAN MODE) AND country LIKE '%canada%

0.12 seconds (fast)

SELECT COUNT(1) FROM `mytable` WHERE `country` LIKE '%united states%'

1.09 seconds (comparatively fast)

 SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
AGAINST(' +united +states' IN BOOLEAN MODE) AND country LIKE '%united states%

3.9 seconds (slow)


Solution

  • My solution is to generate my own list of stopwords. I downloaded a sample of data and found the most common terms using Python. Common terms are excluded from MATCH queries through PHP. In my data, "United" and "States" are common terms:

    $SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
    AGAINST('+sales +united +states' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"
    

    Becomes:

    $SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
    AGAINST('+sales' IN BOOLEAN MODE) AND country LIKE '%united states% AND tagline LIKE '%sales%'"
    

    When all of the search terms are "common" the MATCH statement is removed entirely:

    $SQL =  "SELECT COUNT(1) FROM `mytable` WHERE MATCH(tagline, location, country)
    AGAINST('+united +states' IN BOOLEAN MODE) AND country LIKE '%united states%"
    

    Becomes:

    $SQL =  "SELECT COUNT(1) FROM `mytable` WHERE country LIKE '%united states%"
    

    The best of both worlds.