mariadbrlike

Why does RLIKE match á and è in binary charset in MariaDB 10.2


DB: MariaDB 10.2.13, 10.3.39.

Operator RLIKE return true when checking if in my binary string á exist char è:

convert('á' using binary) RLIKE '[è]': 1

Query:

(root@127.0.0.1:3306) [test]> select char(50081), char(50088), hex(char(50081)), hex('á'), char(50081) RLIKE '[è]', char(50088) RLIKE '[á]', convert(char(50088) using utf8) RLIKE '[á]', convert(char(50088) using utf8mb4) RLIKE '[á]', convert('á' using binary) RLIKE '[è]', 'á' RLIKE '[è]' \G
*************************** 1. row ***************************
                                    char(50081): á
                                    char(50088): è
                               hex(char(50081)): C3A1
                                      hex('á'): C3A1
                       char(50081) RLIKE '[è]': 1
                       char(50088) RLIKE '[á]': 1
   convert(char(50088) using utf8) RLIKE '[á]': 0
convert(char(50088) using utf8mb4) RLIKE '[á]': 0
        convert('á' using binary) RLIKE '[è]': 1
                              'á' RLIKE '[è]': 0
1 row in set (0,00 sec)

Solution

  • Look at the HEX codes of 'á' and 'è', which are 'C3A1' and 'C3A8'.

    select hex('á'), hex('è');
    

    In binary context values are compared byte by byte. Which makes your query equivalent to

    select concat(0xC3, 0xA1) RLIKE concat('[', 0xC3, 0xA8, ']');
    

    The byte 'C3' is contained in 'C3A1', thus the match succeeds.