pythonlinuxmariadbcollation

Error connecting to MariaDB from python: Unknown collation: 'utf8mb4_0900_ai_ci'


On my production server (Debian 12) I have a mariadb database, 'sasquatch_index', that works as expected with a few python scripts that interface with it. I am in the process of setting up my application to work in a testing environment and I want to copy this database to my local system (Manjaro). On my production server I use a command like mariadb-dump -u [username] -p[password] sasquatch_index > ./site_dump.sql To create the site_dump.sql file. I than retrieve this file from my local machine using rsync. I initiate the database on my local machine with mariadb -u [user] -p sasquatch_index < ~/site_dump.sql. I can access this database and its contents via the mariadb client. So far so good.

Issues arise however, when I try to use any of the python scripts that exist to interface with this database on my local machine (The testing environment I am setting up). Said scripts make use of a 'mysql.connector' module that was installed via the command pacman -S python-mysql-connector. For example, search.py exists to preform complex searches against the database. It is run via a command like search.py -s "hello world" And it returns an erroneous output like the following:

Error connecting to database: 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_ci'
Traceback (most recent call last):
  File "/home/josh/git/github/search-sasquatch/./search.py", line 270, in <module>
    main()
  File "/home/josh/git/github/search-sasquatch/./search.py", line 262, in main
    results = performSearch(arguments['searchString'], arguments['safe'], creds)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/josh/git/github/search-sasquatch/./search.py", line 185, in performSearch
    if conn is not None and conn.is_connected():
       ^^^^
UnboundLocalError: cannot access local variable 'conn' where it is not associated with a value

(note the part of the output relating to "Unknown collation." The value of the variable named 'conn' is dependent on making a successful connection to the database.)

Another instance where this "Unknown collation" error appears is when trying to use another one of these python scripts. This particular one, 'extractor.py', appends new data to a table in the 'sasquatch_index' database table called 'sites'. The output of running this script on the local system looks like this:

Parsing https://example.com/index.php...
Error updating the database for url 'https://example.com/index.php': 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_ci'
Parsing https://josh.example.com...
Error updating the database for url 'https://josh.example.com': 1273 (HY000): Unknown collation: 'utf8mb4_0900_ai_ci'

These scripts work perfectly as expected in the production environment, but locally, they produce errors like this.

I've tried modifying the command I ran on the production server to explicitly not use collation 'utf8mb4_0900_ai_ci' i.e mariadb-dump --default-character-set=utf8mb --skip-set-charset -u [user] -p[password] sasquatch_index > site_dump.sql

I've also searched the 'site_dump.sql' file for the string "utf8mb4_0900_ai_ci" and have discovered that said text never occurs inside this file. All the while these errors still occur.

Most recently, I've tried running 'extractor.py' without importing anything (I created the database and table using other scripts from my project) and I continue to get the same errors. 'extractor.py' is still unable to add any data to the sites table.


Solution

  • @danblack made a suggestion that solved these problems which was to switch from using the mysql.connector library to the library called mariadb. More information about this library can be found here.

    I would like to note, however, that this library doesn't have a is_connected() method. I had to make some minor adjustments to other parts of my code to account for this. Outside of that small oversight, things work as expected now.

    This thread was also very useful.