pythonunicode-string

Python .split() without 'u


In Python, if I have a string like:

a =" Hello - to - everybody"

And I do

a.split('-')

then I get

[u'Hello', u'to', u'everybody']

This is just an example.

How can I get a simple list without that annoying u'??


Solution

  • The u means that it's a unicode string - your original string must also have been a unicode string. Generally it's a good idea to keep strings Unicode as trying to convert to normal strings could potentially fail due to characters with no equivalent.

    The u is purely used to let you know it's a unicode string in the representation - it will not affect the string itself.

    In general, unicode strings work exactly as normal strings, so there should be no issue with leaving them as unicode strings.

    In Python 3.x, unicode strings are the default, and don't have the u prepended (instead, bytes (the equivalent to old strings) are prepended with b).

    If you really, really need to convert to a normal string (rarely the case, but potentially an issue if you are using an extension library that doesn't support unicode strings, for example), take a look at unicode.encode() and unicode.decode(). You can either do this before the split, or after the split using a list comprehension.