javascriptjison

how to tokenize bangla digits as nunbers and work with them?


Salam❤️ I am beginner at Jison. Learning small things day by day. How I could work with Bangla Numbers with Jison? I meant, How can I work with jison and use Bangla Digits as NUMBER token and work with them (RESULT NUMBER MUST BE IN BANGLA) I used regex but it's not phrasing only tokenizing 😪 Please help me ❤️ Thank you


Solution

  • Jison doesn't attempt to convert between numbers and strings. All it does is identify where the numbers in the input are and how they relate to the other tokens in the input. That's what parsing is about: dividing text into parts. The rest is interpretation, and for that you need to use the programming language you are working with, in this case JavaScript.

    Unfortunately (and slightly surprisingly) JavaScript's Unicode support is not very complete. In particular, it does not provide any official interface to the Unicode Character Database (UCD) or the Unicode properties in that database, except the minimum needed to implement a subset of Unicode regular expression matching (and that only if the regular expression has the u flag set). So you can't do what would seem logical, which is consult the Numeric_Value property of each character.

    But since you're only interested in Bangla digits and not digits in all the scripts which Unicode can represent, it's reasonable to just hard-code the translation. So you could convert a Bangla number to a JavaScript number (that is, not a string) using

    const numberFromBangla =
        str => +(str.replace(/[\u09e6-\u09ef]/g,
                             digit => String.fromCharCode(digit.charCodeAt(0)-2486)))
    

    And convert a number back to a Bangla string using

    const banglaFromNumber =
        n => ("" + n).replace(/[0-9]/g,
                              digit=>String.fromCharCode(digit.charCodeAt(0)+2486))
    

    With a more modern JavaScript, you could use replaceAll instead of replace (without the g flag). You could also use codePointAt instead of charCodeAt if your Javascript environment supports it, but in the case of Bangla digits it makes no difference at all.

    Note that the above does not handle commas, either for input or for output. If you want to write ৮৭৬৫৪৩২ as ৮৭,৬৫,৪৩২, you'll need to write some more code.