pythoncharacter-encodingasciiicupyicu

How transliterate unicode text with PyICU to ASCII?


There is the PyICU library, which I understand can be used to transliterate strings. However there are no docs. Anyone have a simple example which transliterates a unicode string to ASCII, with PyICU?

The C++ ICU documentation for transliteration is here, but I don't understand how to call it from Python.


Solution

  • There is a nice cheat sheet for PyICU here: https://gist.github.com/dpk/8325992

    Here's a slightly modified example:

    >>> import icu
    >>> tl = icu.Transliterator.createInstance('Any-Latin; Latin-ASCII')
    >>> tl.transliterate('Ψάπφω')
    'Psappho'