Got a MariaDB version 10.6.18-MariaDB-0ubuntu0.22.04.1-log. I'm using DBeaver to connect.
Got a table with a single text field VARCHAR(50). It can't be null and it cannot contains duplicates.
My problem is that database thinks values like Berlin
and Berlín
(with accent) is the same. After reading some questions and documentations looks like I need to change collations:
Tried changing collations of connection and database to utf8mb4_unicode_ci
but I'm not getting what i need.
Tried changing collations to utf8mb4_0900_as_ci
but it raises error
SQL Error [1273] [HY000]: (conn=37227) Unknown collation: 'utf8mb4_0900_as_ci'
(conn=37227) Unknown collation: 'utf8mb4_0900_as_ci'
(conn=37227) Unknown collation: 'utf8mb4_0900_as_ci'
Just as example, look at this code:
INSERT INTO table (field_name) VALUES ('Berlin');
INSERT INTO table (field_name) VALUES ('BERLIN');
INSERT INTO table (field_name) VALUES ('Berlín');
INSERT INTO table (field_name) VALUES ('BERlín');
Rows 2 and 4 should raise error because of duplicates (and that's perfect!). But my database raises error on row 3 too because Berlin
and Berlín
is the same (when it should not because the accent).
I'm stuck and I cannot get a solution to this (ACCENT sensitive but CASE insensitive)
Update:
Output of SELECT VERSION();
:10.6.18-MariaDB-0ubuntu0.22.04.1-log
Output of SHOW CREATE TABLE
:
CREATE TABLE `dim_publishers` (
`PUBLISHER_ID` int(11) NOT NULL AUTO_INCREMENT,
`PUBLISHER_NAME` varchar(255) NOT NULL,
`PUBLISHER_SOURCE` varchar(50) DEFAULT NULL,
PRIMARY KEY (`PUBLISHER_ID`),
UNIQUE KEY `PUBLISHER_NAME` (`PUBLISHER_NAME`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
Due to customer refusing updating MariaDB (best option indeed). I followed @ysth and @Barmar comments and I did this:
CREATE TABLE t_dummy(
t_ID int(11) NOT NULL AUTO_INCREMENT,
t_NAME varchar(255) NOT NULL,
t_SOURCE varchar(50) DEFAULT NULL,
L_t_NAME varchar(255) COLLATE 'utf8mb4_bin' AS (lower(t_NAME)) STORED,
PRIMARY KEY (t_ID),
UNIQUE KEY unique_t_name (L_t_NAME)
);
Not the best solution but it works as I need because:
utf8mb4_bin
so it is accent sensitive and prevents duplicatesI'm aware that duplicating info is not a good practice but now customer is really happy because he can use 2 columnas to search:
t_NAME
and search for Berlin
he will get Berlin
and Berlín
L_t_NAME
he will find exact match (case insensitive) so he can use Berlin
only if needed (same for Berlín
).So I've been able to provide a solution to the problem, Thanks!