I am able to convert an Hindi script written in English back to Hindi
import codecs,string
from indic_transliteration import sanscript
from indic_transliteration.sanscript import SchemeMap, SCHEMES, transliterate
def is_hindi(character):
maxchar = max(character)
if u'\u0900' <= maxchar <= u'\u097f':
return character
else:
print(transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)
character = 'bakrya'
is_hindi(character)
Output:
बक्र्य
But If I try to do something like this, I don't get any conversions
character = 'Bakrya विकणे आहे'
is_hindi(character)
Output:
Bakrya विकणे आहे
Expected Output:
बक्र्य विकणे आहे
I also tried the library Polyglot but I am getting similar results with it.
Preface: I know nothing of devanagari, so you will have to bear with me.
First, consider your function. It can return two things, character
or None
(print just outputs something, it doesn't actually return a value). That makes your first output example originate from the print function, not Python evaluating your last statement.
Then, when you consider your second test string, it will see that there's some Devanagari text and just return the string back. What you have to do, if this transliteration works as I think it does, is to apply this function to every word in your text.
I modified your function to:
def is_hindi(character):
maxchar = max(character)
if u'\u0900' <= maxchar <= u'\u097f':
return character
else:
return transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)
and modified your call to
' '.join(map(is_hindi, character.split()))
Let me explain, from right to left. First, I split your test string into the separate words with .split()
. Then, I map (i.e., apply the function to every element) the new is_hindi
function to this new list. Last, I join the separate words with a space to return your converted string.
Output:
'बक्र्य विकणे आहे'
If I may suggest, I would place this splitting/mapping functionality into another function, to make things easier to apply.
Edit: I had to modify your test string from 'Bakrya विकणे आहे'
to 'bakrya विकणे आहे'
because B
wasn't being converted. This can be fixed in a generic text with character.lower()
.