This is a sample program I made:
>>> print u'\u1212'
ሒ
>>> print '\u1212'
\u1212
>>> print unicode('\u1212')
\u1212
Why do I get \u1212
instead of ሒ
when I print unicode('\u1212')
?
I'm making a program to store data and not print it, so how do I store ሒ
instead of \u1212
? Now obviously I can't do something like:
x = u''+unicode('\u1212')
Interestingly even if I do that, here's what I get:
\u1212
Another fact that I think is worth mentioning :
>>> u'\u1212' == unicode('\u1212')
False
What do I do to store ሒ
or some other character like that instead of \uxxxx
?
'\u1212'
is an ASCII string with 6 characters: \
, u
, 1
, 2
, 1
, and 2
.
unicode('\u1212')
is a Unicode string with 6 characters: \
, u
, 1
, 2
, 1
, and 2
u'\u1212'
is a Unicode string with one character: ሒ
.
You should use Unicode strings all around, if that's what you want.
u'\u1212'
If for some reason you need to convert '\u1212'
to u'\u1212'
, use
'\u1212'.decode('unicode-escape')
(Note that in Python 3, strings are always Unicode.)