pythonpython-2.7utf-8pymssqlcp1251

How to fix UnicodeDecodeError: 'utf8' decode byte 0xc0 read in sql file


This is tx.sql

 DECLARE @Cnt INT,
    @ParticipantID UNIQUEIDENTIFIER

SELECT ParticipantID INTO #ids
FROM dbo.rbd_Participants
/* sun */
WHERE surname='Пупкин'

This is python script

with open('tx.sql', 'r') as f:
    script = f.read().decode('utf8') 
script = re.sub(r'\/\*.*?\*\/', '', script, flags=re.DOTALL)multiline comment
script = re.sub(r'--.*$', '', script, flags=re.MULTILINE)  line comment

sql = []
do_execute = False
for line in script.split(u'\n'):
    line = line.strip()
    if not line:
        continue
    elif line.upper() == u'GO':
        do_execute = True

    else:
        sql.append(line)
        do_execute = line.endswith(u';')
        #print line


cur.execute(u'\n'.join(sql).encode('utf8'))  

Problem line: script = f.read().decode('utf8')

UnicodeDecodeError: 'utf8' codec can't decode byte 0xc0 in position 134: invalid start byte

I have tried

script = f.read().decode('cp1251')

but line

cur.execute(u'\n'.join(sql).encode('utf8')) 
print (u'\n'.join(sql)).encode('utf8')
DECLARE @Cnt INT,
@ParticipantID UNIQUEIDENTIFIER
SELECT ParticipantID INTO #ids
FROM dbo.rbd_Participants
WHERE surname='РџСѓРїРєРёРЅ'

How to make the correct line?

WHERE surname='РџСѓРїРєРёРЅ'

There must be a string

WHERE surname='Пупкин'


Solution

  • You are reading the data correctly. It is your print statement that is incorrect:

    print (u'\n'.join(sql)).encode('utf8')
    

    Your terminal or console doesn't support UTF-8, so it is showing you the wrong data. Don't encode, leave that to Python.