I have a comment field in a dataset that I need to extract some numbers from. The string looks like this. The data I want would to extract that series120_count =1 and crossing success =2
x <- "series120_count[1]; crossing_success[2]; tag_comments[small]"
I've tried a few things but can't quite get it. For example, my attempt to isolate series120_count is below, but it's not quite there yet.
str_extract(x, "(?<=series120_count)(.+)(?=\\; )")
Ideally, I would like something that matches "series120_count[" at the start, and ends when the bracket closes "]". I'd like to be able to change this as well to get the crossing success by just subbing out the first match with "crossing_success["
If you want to use the lookbehind assertion for both strings and extract the digits, you can use:
\b(?<=crossing_success\[|series120_count\[)\d+(?=])
The pattern matches:
\b
A word boundary to prevent a partial word match(?<=crossing_success\[|series120_count\[)
Positive lookbehind, assert one of the alternatives to the left\d+
Match 1+ digits(?=])
Positive lookahead, assert ]
to the rightlibrary(stringr)
x <- "series120_count[1]; crossing_success[2]; tag_comments[small]"
pattern <- "\\b(?<=crossing_success\\[|series120_count\\[)\\d+(?=])"
matches <- str_extract_all(x, pattern)
print(matches)
Output
[[1]]
[1] "1" "2"
Alternatively you can use a capture group
\b(?:crossing_success|series120_count)\[(\d+)]