rregexregex-lookarounds

Regex: match shortest pattern between two possible delimiters


I am struggling with a regex. I use R for my example, but any compatible regex is welcome.

Here is the problem: consider this exemple:

test <- c("truc/truc/plouf.xlsx","plouf.xlsx","truc/plouf.xlsx")

I would like to have plouf extracted each time. I tried:

library(stringr)
str_extract(test,"(?<=\\/{0,1}).+(?=\\.xlsx)") 

Which gives me

[1] "truc/truc/plouf" "plouf" "truc/plouf"

I naively though that using a lazy .+? in str_extract(test,"(?<=\\/{0,1}).+?(?=\\.xlsx)") would solve the problem, but it does not.

How should I do ?


Solution

  • To extract the terminal filename (having an extension) you should be OK using:

    library(stringr)
    
    test <- c("truc/truc/plouf.xlsx", "plouf.xlsx", "truc/plouf.xlsx")
    files <- str_extract(test, "[^/]+(?=\\.\\w+$)")
    files
    
    [1] "plouf" "plouf" "plouf"
    

    The regex pattern here says to match: