pythonpython-unicodehebrew

How to decode and encode Hebrew strings?


I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish:

>>> word = "שלום"
>>> word = word.decode('UTF-8')
>>> word
u'\u05e9\u05dc\u05d5\u05dd'
>>> print word
שלום
>>> word = word.encode('UTF-8')
>>> word
'\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
>>> print word
׳©׳׳•׳

How should I do it properly?


Solution

  • You'll have to make sure you have the right encoding in your environment (shell or script). If you're using a script include the following:

    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    

    To make sure your environment knows you're using UTF-8. You may find that your shell terminal will accept only ASCII, so make sure it is able to support UTF-8.

    >>> word = "שלום"
    >>> word
    '\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
    >>> print word
    שלום
    >>> word = word.decode('UTF-8')
    >>> word
    u'\u05e9\u05dc\u05d5\u05dd'
    >>> print word
    שלום
    >>> word = word.encode('UTF-8')
    >>> word
    '\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d'
    >>> print word
    שלום
    >>>