I am struggling with a regex. I use R
for my example, but any compatible regex is welcome.
Here is the problem: consider this exemple:
test <- c("truc/truc/plouf.xlsx","plouf.xlsx","truc/plouf.xlsx")
I would like to have plouf
extracted each time.
I tried:
library(stringr)
str_extract(test,"(?<=\\/{0,1}).+(?=\\.xlsx)")
Which gives me
[1] "truc/truc/plouf" "plouf" "truc/plouf"
I naively though that using a lazy .+?
in str_extract(test,"(?<=\\/{0,1}).+?(?=\\.xlsx)")
would solve the problem, but it does not.
How should I do ?
To extract the terminal filename (having an extension) you should be OK using:
library(stringr)
test <- c("truc/truc/plouf.xlsx", "plouf.xlsx", "truc/plouf.xlsx")
files <- str_extract(test, "[^/]+(?=\\.\\w+$)")
files
[1] "plouf" "plouf" "plouf"
The regex pattern here says to match:
[^/]+
match any one or more characters, but not slash(?=\\.\\w+$)
until looking ahead and seeing the file extension (but don't match it)