unicodepython-2.x

Difference between u'string' and unicode(string)


This is a sample program I made:

>>> print u'\u1212'
ሒ
>>> print '\u1212'
\u1212
>>> print unicode('\u1212')
\u1212

Why do I get \u1212 instead of when I print unicode('\u1212')?

I'm making a program to store data and not print it, so how do I store instead of \u1212? Now obviously I can't do something like:

x = u''+unicode('\u1212')

Interestingly even if I do that, here's what I get:

\u1212

Another fact that I think is worth mentioning :

>>> u'\u1212' == unicode('\u1212')
False

What do I do to store or some other character like that instead of \uxxxx?


Solution

  • '\u1212' is an ASCII string with 6 characters: \, u, 1, 2, 1, and 2.

    unicode('\u1212') is a Unicode string with 6 characters: \, u, 1, 2, 1, and 2

    u'\u1212' is a Unicode string with one character: .

    You should use Unicode strings all around, if that's what you want.

    u'\u1212'
    

    If for some reason you need to convert '\u1212' to u'\u1212', use

    '\u1212'.decode('unicode-escape')
    

    (Note that in Python 3, strings are always Unicode.)