sqlmysqlfull-text-searchaggregate-functions

MySQL full-text search with aggregate functions


I have a website connected to a MySQL database of songs, and am working to improve full-text search on the site. A simple query with MATCH() works as expected, but as soon as I add an aggregate function to the SELECT statement, MATCH() no longer returns relevant results.

The query I'm using is somewhat complicated, but I was able to boil it down to a minimal example:

  1. Create a table with an FTS index, and insert three songs:
CREATE TABLE `Song` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `title` varchar(300) DEFAULT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `FTS_Songs` (`title`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;

INSERT INTO Song (`title`)
VALUES ('The Morning Breaks'), ('The Spirit of God'), ('Now Let Us Rejoice');
  1. Run a simple query with MATCH(). This works correctly, bringing the most relevant song (“The Spirit of God”) to the top:
SELECT id, title,
  MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
  MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance
FROM Song
GROUP BY id
ORDER BY exactTitleRelevance DESC, titleRelevance DESC;

Correct result: The row for "The Spirit of God" has a high relevance score, and other rows have a relevance of 0. enter image description here

  1. Add any aggregate function (my real query has a GROUP_CONCAT() to aggregate data from a joined table; in this example, I’m using COUNT() without any joins for simplicity). This does not work correctly. As soon as I add an aggregate function, MATCH() breaks and returns inaccurate results:
SELECT id, title,
  MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
  MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance,
  COUNT(*) AS aggregateColumn
FROM Song
GROUP BY id
ORDER BY exactTitleRelevance DESC, titleRelevance DESC;

Incorrect result: The row for "The Spirit of God" has a relevance score of 0, and a different row that doesn't match the query has a high relevance score. enter image description here

What am I doing wrong? Is it possible to use MATCH() and aggregate functions like COUNT() or GROUP_CONCAT() together in the same query?

I’ve tried and gotten the same results on two different servers – one with MySQL 8.0.37, and the other with MySQL 8.3.0.


Solution

  • You can use EXPLAIN to check how MySQL executes the query:

    EXPLAIN SELECT id, title,
      MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
      MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance,
      COUNT(*) AS aggregateColumn
    FROM Song
    GROUP BY id
    ORDER BY exactTitleRelevance DESC, titleRelevance DESC;
    

    This will give:

    id select_type table partitions type possible_keys key key_len ref rows filtered Extra
    1 SIMPLE Song NULL index PRIMARY,FTS_Songs PRIMARY 4 NULL 3 100.00 Using temporary; Using filesort

    The aggregate function COUNT() along with GROUP BY changes how MySQL processes the query. MySQL uses "id" (PRIMARY) as an index instead of FTS_Songs (FULLTEXT) causing MATCH() to not work as expected.

    To make sure MATCH() calculates relevance accurately:

    e.g.

    WITH lookup AS (
    SELECT 
      id,
      title,
      MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE) AS exactTitleRelevance,
      MATCH(title) AGAINST ('+spirit* +god*' IN BOOLEAN MODE) AS titleRelevance
    FROM Song
    WHERE MATCH(title) AGAINST ('"the spirit of god"' IN BOOLEAN MODE)
    )
    SELECT 
      l.id, 
      l.title,
      l.exactTitleRelevance,
      l.titleRelevance,
      COUNT(*) AS aggregateColumn
    FROM  lookup l
    GROUP BY  l.id,  l.title, l.exactTitleRelevance,  l.titleRelevance;