javascriptregexhindi

Regex to keep all letters in all alphabets along with digits and underscore (problem on hindi letters)


I found out a regex pattern to remove all non alphabet letters: \p{L}

I thus did a regex to remove all non alphabet, non digit and non underscore pattern : /[^\p{L}\d_]/gimu

Unfortunately, it does not work with a hindi character like #फ्रांस which gives फरस

See for yourself here https://regex101.com/r/dnXDK0/1

And please help me :-)


Solution

  • You forgot about diacritics. You need to add \p{M} or \p{Mn} into the negated character class:

    /[^\p{L}\p{M}\d_]/gu
    

    See the regex demo.

    Note you do not need the i and m flags here. m redefines anchor behavior, but your regex contains no ^ nor $. i makes caseful letters match in a case insensitive way, but \p{L} matches all letters, upper- and lowercase ones.