mariadbcollationdbeaver

MariaDB sensivity to accents but not case sensitive


Got a MariaDB version 10.6.18-MariaDB-0ubuntu0.22.04.1-log. I'm using DBeaver to connect.

Got a table with a single text field VARCHAR(50). It can't be null and it cannot contains duplicates.

My problem is that database thinks values like Berlin and Berlín (with accent) is the same. After reading some questions and documentations looks like I need to change collations:

Tried changing collations of connection and database to utf8mb4_unicode_ci but I'm not getting what i need.

Tried changing collations to utf8mb4_0900_as_ci but it raises error

SQL Error [1273] [HY000]: (conn=37227) Unknown collation: 'utf8mb4_0900_as_ci'
  (conn=37227) Unknown collation: 'utf8mb4_0900_as_ci'
  (conn=37227) Unknown collation: 'utf8mb4_0900_as_ci'

Just as example, look at this code:

INSERT INTO table (field_name) VALUES ('Berlin');
INSERT INTO table (field_name) VALUES ('BERLIN');
INSERT INTO table (field_name) VALUES ('Berlín');
INSERT INTO table (field_name) VALUES ('BERlín');

Rows 2 and 4 should raise error because of duplicates (and that's perfect!). But my database raises error on row 3 too because Berlin and Berlín is the same (when it should not because the accent).

I'm stuck and I cannot get a solution to this (ACCENT sensitive but CASE insensitive)

Update:

Output of SELECT VERSION();:10.6.18-MariaDB-0ubuntu0.22.04.1-log

Output of SHOW CREATE TABLE:

CREATE TABLE `dim_publishers` (
  `PUBLISHER_ID` int(11) NOT NULL AUTO_INCREMENT,
  `PUBLISHER_NAME` varchar(255) NOT NULL,
  `PUBLISHER_SOURCE` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`PUBLISHER_ID`),
  UNIQUE KEY `PUBLISHER_NAME` (`PUBLISHER_NAME`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Solution

  • Due to customer refusing updating MariaDB (best option indeed). I followed @ysth and @Barmar comments and I did this:

    CREATE TABLE t_dummy(
      t_ID int(11) NOT NULL AUTO_INCREMENT,
      t_NAME varchar(255) NOT NULL,
      t_SOURCE varchar(50) DEFAULT NULL,
      L_t_NAME varchar(255) COLLATE 'utf8mb4_bin' AS (lower(t_NAME)) STORED,
      PRIMARY KEY (t_ID),
      UNIQUE KEY unique_t_name (L_t_NAME)
    );
    

    Not the best solution but it works as I need because:

    1. Using STORED will save the lowercased string in a column
    2. That column is utf8mb4_bin so it is accent sensitive and prevents duplicates

    I'm aware that duplicating info is not a good practice but now customer is really happy because he can use 2 columnas to search:

    1. If he uses column t_NAME and search for Berlin he will get Berlin and Berlín
    2. If he uses L_t_NAME he will find exact match (case insensitive) so he can use Berlin only if needed (same for Berlín).

    So I've been able to provide a solution to the problem, Thanks!