mysqlindexingin-operatortextmatching

mySQL problems with text matches using IN statement


I have a largish table (2 million rows), with a column containing text identifiers (these are latin names of species, Homo_sapiens, Tyranosaurus_rex, etc.)

I have another table containing latin names and "common" names of species, and I can query this to get a small selection (~140 names) of latin names, some of which map into the first table. I want to get the rows in the first table whose names map exactly to this small selection. The query I'm using to get the small selection (only 140 rows) runs fast, since the common name has a mySQL 'FULLTEXT" index

select distinct latin_name from common_names_table 
  where match(common_name) against('+*mo*' in boolean mode)

But then if I try to use the SQL IN operator to match these into the large, 2 million row table, it takes many minutes,

select latin_name,popularity from big_table 
 where latin_name in (
  select distinct latin_name from common_names_table 
    where match(common_name) against('+*mo*' in boolean mode)
  )
 ORDER BY popularity DESC LIMIT 50;

That is true even though I have set both a full text and a normal index on the latin_name column.

CREATE FULLTEXT INDEX name_fulltext_index ON big_table (latin_name);
CREATE INDEX name_index          ON big_table (latin_name);

How can I speed this up? Is there a problem using the IN operator with indexed text fields? If so, is there some special sort of "exact match" index I can use for text fields? The latin_name fields are both of type "VARCHAR" and of max length 190 in the small table and 200 in the large, if that makes any difference.

Thanks for any help


As requested - here are the table definitions:

CREATE TABLE `big_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `parent` int(11) NOT NULL,
  `latin_name` varchar(200) DEFAULT NULL,
  `popularity` double DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `pop_index` (`popularity`),
  KEY `name_index` (`latin_name`),
  FULLTEXT KEY `name_fulltext_index` (`latin_name`)
) ENGINE=InnoDB AUTO_INCREMENT=1781766 DEFAULT CHARSET=utf8;

CREATE TABLE `common_name_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `latin_name` varchar(190) CHARACTER SET utf8mb4 NOT NULL DEFAULT '',
  `common_name` varchar(190) CHARACTER SET utf8mb4 NOT NULL,
  PRIMARY KEY (`id`),
  KEY `name_index` (`latin_name`),
  FULLTEXT KEY `common_name_index` (`common_name`)
) ENGINE=InnoDB AUTO_INCREMENT=2024 DEFAULT CHARSET=utf8;

Solution

  • AHA - thanks to @krishKM asking for the definitions, I've found the problem. The character set encoding of the two columns that I am trying to match is different: one is the default UTF8 in mySQL, the other is the 'proper' 4 byte utf8mb4 encoding.

    If I set latin_name to the same character encoding in both tables, the query takes ~20 milliseconds instead of 5 minutes.