mysqlfull-text-searchmatch-againstaccent-insensitive

How to make an accent insensitive `MATCH() AGAINST()` sentence?


I have a table called products which has a varchar column called name encoded in latin1_swedish_ci.

I'm trying to make a search query in order to filter products by name using the MATCH() AGAINST() sentence that look like this:

SELECT *
FROM products
WHERE MATCH(name) AGAINST('*search_string*' IN BOOLEAN MODE)

It works really good varchars without accents, however, if I have a product called Colágeno and the search input is Colageno, the query would not catch the product.

So I tried what is showed in this SO answer without success.

SELECT *
FROM products
WHERE MATCH(CONVERT(BINARY(name) USING utf8)) AGAINST('*Colageno*' IN BOOLEAN MODE)

I got the error:

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'CONVERT(BINARY(name) USING utf8)) AGAINST('*Colageno*' IN BOOLEAN MODE)' at line 3

I only want to catch special characters like á, é, í, ó, ú.

Thanks in advance!


Solution

  • Solved!

    After some discussion in the comments, I realized that the error was not in the MATCH() AGAINST() statement, since it does not distinguish diacritics by default.

    So the problem had to do with how the diacritics were stored in MySQL, in my case, they were stored like this COLÁGENO -> COLÃ<0x81>GENO. Therefore, it was necessary to find out how to save the tildes correctly without corrupting the table.

    Encodings

    I tried making encoding changes by executing in phpmyadmin the instruction:

    ALTER TABLE products CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;
    

    However, there were no changes in performance.

    I then tried changing the encoding in the exported table file by changing DEFAULT CHARSET=latin1 to DEFAULT CHARSET=utf8mb4, but no change in results either.

    Modify the accents manually

    My other attempt was, once the encoding change was made, to manually modify the cells that had characters like Ã<0x81> by their corresponding character with tilde Á. But sadly this seemed to corrupt queries to the table (I was still able to access the other tables normally).

    So I thought about what masterguru said in the comments about encoding changes altering the way scripts connect to the table, and apparently when I manually modified a character to put the tilde, the scripts kept accessing the table. table with the previous encoding.

    Solution

    The scripts were in PHP so I had to find the solution in that language.

    I found this answer in English SO where it said how to save tildes in the database correctly. To do this, you had to write...

    mysqli_set_charset($connection, "utf8");
    

    ...this after the connection to the database. Finally, I had to change the rare characters in my database to their corresponding tilde character for the MATCH AGAINST to work, and voila!


    Many thanks to masterguru, Triby and aeportugal for the help provided in the comments!

    Original post: https://es.stackoverflow.com/questions/511745/como-hacer-que-la-b%c3%basqueda-match-against-ignore-los-tildes-o-acentos