Please help me with the following confusion:
qdapRegex::rm_nchar_words("è ûé", "1,2")
[1] "è ûé"
qdapRegex::rm_nchar_words('k ku ppp d', "1,2")
[1] "ppp"
Why in the first code line it doesn't respond with "" but in the second one it works as expected. What do I miss here? The only thing I can think that in the first line of code the string is built from non English letters.
Any solution?
As mentioned by the author of the package:
It uses \w
to define letters which is defined as [A-Za-z0-9_]
.
You would need to write your own custom regex to handle the non-ascii letters
UPDATE:
On my Win 7 machine the output is as expected.
One of the possible ways to solve it using pattern "[\\pL_]"
(any word in any language)
rm_nchar_words("è ûé", "1,2", pattern = "[\\pL_]")
Locale on Win machine:
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
I will keep investigate this and post updates for my answer.
UPDATE 2:
rm_nchar_words("è ûé", "1,2", pattern = "[\\pL_]")
""
works on my Ubuntu 18.04.