javascriptregexparsingjison

Error while tokenizing Bangla Number(Digits) as Number token using RegEx


I'm new to Jison and I want to tokenize Bangla Digits ০-৯ as numbers. I've tried the regex below but it's not working with it: Regular Expression: (^[\u09E6-\u09EF])+("."[\u09E6-\u09EF])\b

On testing ৭+১ It showing expected... 'NUMBER' GOT 'Invalid' 😅 Expected result : NUMBER '+' NUMBER

Please help me out!! ❤️


Solution

  • Good question.

    The problem is the \b word boundary assertion. For some reason, javascript's regular expression engine specification does not consider Bangla digits to be word characters. For \w and \b, only ascii letters and digits count as word characters.

    Consequently, a Bangla digit followed by a plus sign (which is certainly not a word character) is not considered a word boundary, and thus doesn't match the assertion.

    If you just drop the \b, it should work (although I would also drop the ^: Jison patterns are always anchored, so there's no need to insist).