[SOLVED] Same regex behaves differently on grepl versus stri_detect

Same regex behaves differently on grepl versus stri_detect_regex

edit I encounter this on R version 3.6.1, appearently in newer versions this issue does not exist and the functions do behave similar.

Consider this vector, where the first element is in the Latin-1 Supplement unicode block, the second element is in the Latin Extended Additional unicode block, and element 3-7 are in the Latin Extended D unicode block (Same I see for the Latin Extended E unicode block). The regular expression used is ^[\\p{L} ]+$ which is supposed to match a string with any kind of letter from any language. I see that grepl and stri_detect_regex interpret p{L} differently.

v <- c("é", "Ḃ", "Ꞵ", "ꞵ", "Ꞷ", "ꞷ","keepme", "remove$me", "remove.me")

v[grepl("^[\\p{L} ]+$", v, perl = T)]
# [1] "é"      "Ḃ"      "keepme"

v[stri_detect_regex(v, "^[\\p{L} ]+$")]
# [1] "é"      "Ḃ"      "\ua7b4" "\ua7b5" "\ua7b6" "\ua7b7" "keepme"

Is there any documentation on why they behave different on this expression?

Solution

This happens on older R versions, R version 3.6.1 base grepl does not recognize all unicode blocks using regex p{L}, however as @Oliver commented, it does as expected in later versions of R as he tested in R 4.2.1. For me the question is answered. Thanks!