pythonunicodearabictransliterationfarsi

Python unicode translation leaves input unchanged


I have written a python code that is supposed to translate/transliterate Persian characters. Here is a chunk of the translation table:

dictionary = {
'\u062B': 's̱',
'\u062C': 'ǧ',
}

'\u062B' is "ث" which should be translated to "s̱"

but when I run the following:

string = ('\u062B')
print("Original string:", string)

string = ('\u062B')
print("Translated string:", string.translate(dictionary))

My original string and Translated string are the same:

Original string: ث
Translated string: ث

So the translation doesn't occur. What am I doing wrong?


Solution

  • The str.translate method table maps from unicode ordinals (i.e. integers) to ordinals, strings, or None. Use str.maketrans to convert the string-to-string mapping appropriately:

    >>> string
    'ث'
    >>> str.maketrans(dictionary)
    {1579: 's̱', 1580: 'ǧ'}
    >>> string.translate(str.maketrans(dictionary))
    's̱'