mysqlrlike

MySQL REGEXP acting with case sensitivity without BINARY mode?


I'm quite confused. I have a source string in the database; some HTML:

"body": "<html><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body><div dir=\"auto\">Nope no attachment</div><br><div class=\"gmail_quote\"><div dir=\"ltr\" class=\"gmail_attr\">

Here's the SELECT:

SELECT * FROM table1 WHERE column_details REGEXP '(nope.no).attach';

When I select this with REGEXP of '(nope.no).attach' it fails to match. When I uppercase the Nope, it matches. When I LCASE(column_details) and go back to lower-case nope, it matches. What's going on here? My understanding is REGEXP is case insensitive, and as far as I know I'm not enforcing binary mode... or is it binary by default? If so, how do I make it case insensitive and disable binary mode matching?

Thanks!


Solution

  • It's not that REGEXP is case-insensitive. It depends on the collation of the column. REGEXP can be case-sensitive or insensitive.

    mysql> set @h = '<html><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"></head><body><div dir=\"auto\">Nope no attachment</div><br><div class=\"gmail_quote\"><div dir=\"ltr\" class=\"gmail_attr\">';
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> select @h regexp '(nope.no).attach';
    +------------------------------+
    | @h regexp '(nope.no).attach' |
    +------------------------------+
    |                            1 |
    +------------------------------+
    1 row in set (0.00 sec)
    
    mysql> select @h collate utf8mb4_bin regexp '(nope.no).attach';
    +--------------------------------------------------+
    | @h collate utf8mb4_bin regexp '(nope.no).attach' |
    +--------------------------------------------------+
    |                                                0 |
    +--------------------------------------------------+
    

    I'd check the collation of your column:

    SELECT COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = 'table1' AND COLUMN_NAME = 'column_details';