I am using Ruby regex to filter the user input to allow only numerics and alphabets of any language. But for some words the spelling is different after using regex. ex:
text = 'कंप्यूटर'
regex = /[^(\p{Alpha})]/
filter_text = text.gsub(regex, '') #return result कंपयूटर
You can see the input and output are different. How to resolve the same.
You can use
regex = /[^\p{L}\p{Nd}\p{M}]+/
It will match any one or more chars other than Unicode letters or digits.
\p{Nd}
matches all Unicode characters in the 'Number, Decimal Digit' category, \p{L}
matches all Unicode letters and \p{M}
matches any diacritic marks.
See the Ruby demo:
text = 'कंप्यूटर'
regex = /[^\p{L}\p{Nd}\p{M}]+/
filter_text = text.gsub(regex, '')
puts filter_text
# => कंप्यूटर