im using pymysql, Binlog2sql every thing works fine with English characters the connection string im using is
conn_setting = {'host': args.host, 'port': args.port, 'user': args.user, 'passwd': args.password, 'charset': 'utf8'} # ISO-8859-1 utf8mb4
but when using it with Unicode characters (Arabic) i got this error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe3 in position 70: invalid continuation byte
the database charset is utf8mb4
i tried other encoding like ISO-8859-1 , utf8mb4 but with no luck the pymysql documentation doesn't specify any charset
system configuration pymysql 0.9.3 python 3.10 mysql 8 windows 11 or linux ubuntu 20 Binlog2sql
UPDATE #1 the string I'm trying to decode
b"INSERT INTO `db1`.`t3`(`idt3`, `t3col`) VALUES (56, '\xc7\xed');
with this code
str= str.decode("utf-8")
when using Windows-1256 on windows machine it works fine but on linux machine it returns a different text without errors
UPDATE #2 this library actually create a temp file then store the quires on it finally it will read it from the file that's why it will messed up the data from the database the file on windows is ANSI encoded using cp1256 as @Rick James suggested will solve it for both Windows and Linux
In cp1256 C7ED maps to 'اي' -- Is this what you were hoping for?
If so, then establish that the client is using CHARACTER SET cp1256
in the connection parameters. Or by using `SET NAMES cp1256 as the first SQL statement after connecting.
You can either have the columns declared character set cp1256 or character set utf8. MySQL will convert between the client's encoding (cp1256) and the column's (cp1256 or utf8, as you choose in CREATE TABLE
)