pythonunicodeasciiapsw

How can strings with non-ASCII characters be retrieved with OptParse?


I'm using the OptParse module to retrieve a string value. OptParse only supports str typed strings, not unicode ones.

So let's say I start my script with:

./someScript --some-option ééééé

French characters, such as 'é', being typed str, trigger UnicodeDecodeErrors when read in the code:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 99: ordinal not in range(128)

I played around a bit with the unicode built-in function, but either I get an error, or the character disappears:

>>> unicode('é');
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
>>> unicode('é', errors='ignore');
u''

Is there anything I can do to use OptParse to retrieve unicode/utf-8 strings?

It seems that the string can be retrieved and printed OK, but then I try to use that string with SQLite (using the APSW module), and it tries to convert to unicode somehow with cursor.execute("..."), and then the error occurs.

Here is a sample program that causes the error:

#!/usr/bin/python
# coding: utf-8

import os, sys, optparse
parser = optparse.OptionParser()
parser.add_option("--some-option")
(opts, args) = parser.parse_args()
print unicode(opts.some_option)

Solution

  • Input is returned in the console encoding, so based on your updated example, use:

    print opts.some_option.decode(sys.stdin.encoding)
    

    unicode(opts.some_option) defaults to using ascii as the encoding.