pythonmysqlencodingpymysqlmysqlbinlog

Pymysql UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3


im using pymysql, Binlog2sql every thing works fine with English characters the connection string im using is

conn_setting = {'host': args.host, 'port': args.port, 'user': args.user, 'passwd': args.password, 'charset': 'utf8'} # ISO-8859-1 utf8mb4

but when using it with Unicode characters (Arabic) i got this error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 70: invalid continuation byte

the database charset is utf8mb4

i tried other encoding like ISO-8859-1 , utf8mb4 but with no luck the pymysql documentation doesn't specify any charset

system configuration pymysql 0.9.3 python 3.10 mysql 8 windows 11 or linux ubuntu 20 Binlog2sql

UPDATE #1 the string I'm trying to decode

b"INSERT INTO `db1`.`t3`(`idt3`, `t3col`) VALUES (56, '\xc7\xed');

with this code

str= str.decode("utf-8")

when using Windows-1256 on windows machine it works fine but on linux machine it returns a different text without errors

UPDATE #2 this library actually create a temp file then store the quires on it finally it will read it from the file that's why it will messed up the data from the database the file on windows is ANSI encoded using cp1256 as @Rick James suggested will solve it for both Windows and Linux


Solution

  • In cp1256 C7ED maps to 'اي' -- Is this what you were hoping for?

    If so, then establish that the client is using CHARACTER SET cp1256 in the connection parameters. Or by using `SET NAMES cp1256 as the first SQL statement after connecting.

    You can either have the columns declared character set cp1256 or character set utf8. MySQL will convert between the client's encoding (cp1256) and the column's (cp1256 or utf8, as you choose in CREATE TABLE)