I have this char in an xml file:
<data>
<products>
<color>fumè</color>
</product>
</data>
I try to generate an instance of ElementTree with the following code:
string_data = open('file.xml')
x = ElementTree.fromstring(unicode(string_data.encode('utf-8')))
and I get the following error:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe8' in position 185: ordinal not in range(128)
(NOTE: The position is not exact, I sampled the xml from a larger one).
How to solve it? Thanks
You do not need to decode XML for ElementTree to work. XML carries it's own encoding information (defaulting to UTF-8) and ElementTree does the work for you, outputting unicode:
>>> data = '''\
... <data>
... <products>
... <color>fumè</color>
... </products>
... </data>
... '''
>>> x = ElementTree.fromstring(data)
>>> x[0][0].text
u'fum\xe8'
If your data is contained in a file(like) object, just pass the filename or file object directly to the ElementTree.parse()
function:
x = ElementTree.parse('file.xml')