regexstringrgsub

Regex that extracts values between specific characters at beginning and end of string


I have a comment field in a dataset that I need to extract some numbers from. The string looks like this. The data I want would to extract that series120_count =1 and crossing success =2

x <- "series120_count[1]; crossing_success[2]; tag_comments[small]"

I've tried a few things but can't quite get it. For example, my attempt to isolate series120_count is below, but it's not quite there yet.

str_extract(x, "(?<=series120_count)(.+)(?=\\; )")

Ideally, I would like something that matches "series120_count[" at the start, and ends when the bracket closes "]". I'd like to be able to change this as well to get the crossing success by just subbing out the first match with "crossing_success["


Solution

  • If you want to use the lookbehind assertion for both strings and extract the digits, you can use:

    \b(?<=crossing_success\[|series120_count\[)\d+(?=])
    

    The pattern matches:

    Regex demo | R demo

    library(stringr)
    
    x <- "series120_count[1]; crossing_success[2]; tag_comments[small]"
    pattern <- "\\b(?<=crossing_success\\[|series120_count\\[)\\d+(?=])"
    matches <- str_extract_all(x, pattern)
    print(matches)
    

    Output

    [[1]]
    [1] "1" "2"
    

    Alternatively you can use a capture group

    \b(?:crossing_success|series120_count)\[(\d+)]
    

    Regex demo