pythonunicodeunicode-literals

Python unicode string literals :: what's the difference between '\u0391' and u'\u0391'


I am using Python 2.7.3. Can anybody explain the difference between the literals:

'\u0391'

and:

u'\u0391'

and the different way they are echoed in the REPL below (especially the extra slash added to a1):

>>> a1='\u0391'
>>> a1
'\\u0391'
>>> type(a1)
<type 'str'>
>>> 
>>> a2=u'\u0391'
>>> a2
u'\u0391'
>>> type(a2)
<type 'unicode'>
>>> 

Solution

  • You can only use unicode escapes (\uabcd) in a unicode string literal. They have no meaning in a byte string. A Python 2 Unicode literal (u'some text') is a different type of Python object from a python byte string ('some text').

    It's like using \t versus \T; the former has meaning in python literals (it's interpreted as a tab character), the latter just means a backslash and a capital T (two characters).

    To help understand the difference between Unicode and byte strings, please do read the Python Unicode HOWTO; I can also recommend the Joel Spolsky on Unicode article.

    Note: in Python 3, the same differences apply, but 'some text' is a Unicode string literal, and b'some text' is the bytestring syntax.