pythonnlptransliterationindic

Transliterate sentence written in 2 different scripts to a single script


I am able to convert an Hindi script written in English back to Hindi

import codecs,string
from indic_transliteration import sanscript
from indic_transliteration.sanscript import SchemeMap, SCHEMES, transliterate


def is_hindi(character):
    maxchar = max(character)
    if u'\u0900' <= maxchar <= u'\u097f':
    return character
    else:
        print(transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)

character = 'bakrya'
is_hindi(character)

Output:
बक्र्य

But If I try to do something like this, I don't get any conversions

character = 'Bakrya विकणे आहे'
is_hindi(character)

Output:
Bakrya विकणे आहे

Expected Output:
बक्र्य विकणे आहे

I also tried the library Polyglot but I am getting similar results with it.


Solution

  • Preface: I know nothing of devanagari, so you will have to bear with me.

    First, consider your function. It can return two things, character or None (print just outputs something, it doesn't actually return a value). That makes your first output example originate from the print function, not Python evaluating your last statement.

    Then, when you consider your second test string, it will see that there's some Devanagari text and just return the string back. What you have to do, if this transliteration works as I think it does, is to apply this function to every word in your text.

    I modified your function to:

    def is_hindi(character):
        maxchar = max(character)
        if u'\u0900' <= maxchar <= u'\u097f':
            return character
        else:
            return transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)
    

    and modified your call to

    ' '.join(map(is_hindi, character.split()))
    

    Let me explain, from right to left. First, I split your test string into the separate words with .split(). Then, I map (i.e., apply the function to every element) the new is_hindi function to this new list. Last, I join the separate words with a space to return your converted string.

    Output:

    'बक्र्य विकणे आहे'
    

    If I may suggest, I would place this splitting/mapping functionality into another function, to make things easier to apply.

    Edit: I had to modify your test string from 'Bakrya विकणे आहे' to 'bakrya विकणे आहे' because B wasn't being converted. This can be fixed in a generic text with character.lower().