regexrubyruby-on-rails-4hindi

Ruby regex is modifying the user input when gsub using regex in regional languages?


I am using Ruby regex to filter the user input to allow only numerics and alphabets of any language. But for some words the spelling is different after using regex. ex:

text = 'कंप्यूटर'
regex = /[^(\p{Alpha})]/
filter_text = text.gsub(regex, '') #return result कंपयूटर

You can see the input and output are different. How to resolve the same.


Solution

  • You can use

    regex = /[^\p{L}\p{Nd}\p{M}]+/
    

    It will match any one or more chars other than Unicode letters or digits.

    \p{Nd} matches all Unicode characters in the 'Number, Decimal Digit' category, \p{L} matches all Unicode letters and \p{M} matches any diacritic marks.

    See the Ruby demo:

    text = 'कंप्यूटर'
    regex = /[^\p{L}\p{Nd}\p{M}]+/
    filter_text = text.gsub(regex, '')
    puts filter_text
    # => कंप्यूटर