pythonsqlsqlitemariadb

Python + Sqlite3 dump, source to MariaDB: Unknown collation: 'utf8mb4_0900_ai_ci' (db noob)


I'm following Head First Python (3ed). I'm at the last chapter and have a bug I can't get past.

I've got an sqlite3 database that I need to port to MariaDB. I've got the schema and data in separate files:

sqlite3 CoachDB.sqlite3 .schema > schema.sql
sqlite3 CoachDB.sqlite3 '.dump swimmers events times --data-only' > data.sql

I've installed MariaDB via apt, granted privileges to a user, and logged in to get MariaDB prompt:

mariadb -u swimuser -p swimDB

source schema.sql;
source data.sql;

Now I run my python app locally that connects to the MariaDB but, and I'm not sure where exactly, the call fails with:

mysql.connector.errors.DatabaseError: 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_ci'

Back at MariaDB prompt:

show collation like 'utf8%'; # shows no collation 'utf8mb4_0900_ai_ci'

select * from INFORMATION_SCHEMA.SCHEMATA; # shows default is is utf8mb4 / utf8mb4_uca1400_ai_ci

At shell prompt, file -i schema.sql shows it's plain-text in us-ascii. I tried opening schema.sql and data.sql in notepadqq and saving them as utf8 but I still got the same error. I dropped the database and recreated it with:

CREATE DATABASE swimDB
    DEFAULT CHARACTER SET utf8mb4
    DEFAULT COLLATE utf8mb4_unicode_520_ci;

...then again sourced schema.sql and data.sql and still got the same error.

I saw a post where someone asked for this so...

MariaDB [swimDB]> SHOW VARIABLES WHERE VALUE LIKE 'utf%';
+--------------------------+-------------------------------+
| Variable_name            | Value                         |
+--------------------------+-------------------------------+
| character_set_client     | utf8mb3                       |
| character_set_collations | utf8mb4=utf8mb4_uca1400_ai_ci |
| character_set_connection | utf8mb3                       |
| character_set_database   | utf8mb4                       |
| character_set_results    | utf8mb3                       |
| character_set_server     | utf8mb4                       |
| character_set_system     | utf8mb3                       |
| collation_connection     | utf8mb3_general_ci            |
| collation_database       | utf8mb4_uca1400_ai_ci         |
| collation_server         | utf8mb4_uca1400_ai_ci         |
| old_mode                 | UTF8_IS_UTF8MB3               |
+--------------------------+-------------------------------+

So I guess there's a character set or encoding problem with the data (?).

At this point I'm lost, madly searching the interwebs for clues. Any help appreciated and sorry for the long post :)


Solution

  • MySQL Connector appears to be forcing a collation that does not exist in MariaDB (utf8mb4_0900_ai_ci).

    Per comments it appears that attempting to force a later MariaDB collation utf8mb4_uca1400_ai_ci on the connection appears to not resolve the connection.

    MariaDB Connector/Python its actually tested with MariaDB server. Conforming to the Python Database API Specification v2.0 (PEP 249) it should be compatible with the MySQL (Python) Connector as a replacement connection module for your MariaDB server and application.