I'm using the OptParse
module to retrieve a string value. OptParse
only supports str
typed strings, not unicode
ones.
So let's say I start my script with:
./someScript --some-option ééééé
French characters, such as 'é', being typed str
, trigger UnicodeDecodeError
s when read in the code:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 99: ordinal not in range(128)
I played around a bit with the unicode built-in function, but either I get an error, or the character disappears:
>>> unicode('é');
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> unicode('é', errors='ignore');
u''
Is there anything I can do to use OptParse
to retrieve unicode
/utf-8
strings?
It seems that the string can be retrieved and printed OK, but then I try to use that string with SQLite (using the APSW module), and it tries to convert to unicode somehow with cursor.execute("...")
, and then the error occurs.
Here is a sample program that causes the error:
#!/usr/bin/python
# coding: utf-8
import os, sys, optparse
parser = optparse.OptionParser()
parser.add_option("--some-option")
(opts, args) = parser.parse_args()
print unicode(opts.some_option)
Input is returned in the console encoding, so based on your updated example, use:
print opts.some_option.decode(sys.stdin.encoding)
unicode(opts.some_option)
defaults to using ascii
as the encoding.