I have a string in which I want to get out the city, in this example it would be 'Elland Rd' and 'Leeds'.
mystring = "0000\" club_info=\"Elland Rd, Leeds\" Pitch=\"100x50\""
city = gsub(".* club_info=\"(.*),(.+)\.*", "\\2", mystring) #cant get this part to work
My theory behind getting the city is to search for everything after the comma and up until the backslash but I cant seem to get it to recognize the backslash
I prefer strcapture
to extract multiple patterns vice repeated gsub
ing, how about this?
strcapture('.*club_info="([^"]+),([^"]+)".(.*)', mystring, list(x1="", x2="", x3=""))
# x1 x2 x3
# 1 Elland Rd Leeds Pitch="100x50"
(It was not required to include the Pitch=
in there, but I thought you might use it since it appears you're doing reductive gsub
ing.)
FYI, x2
here has a leading space; it could be handled in the regex, but if you are not 100% positive it's in all cases, then it might be simpler to add trimws(.)
, as in
strcapture('.*club_info="([^"]+),([^"]+)".(.*)', mystring, list(x1="", x2="", x3="")) |>
lapply(trimws)
# $x1
# [1] "Elland Rd"
# $x2
# [1] "Leeds"
# $x3
# [1] "Pitch=\"100x50\""
In this case it does drop from a data.frame
to a list
, but I'm not certain you need a frame, a named list should suffice. If you really want it as a frame --- and many of my use-cases really prefer that --- just add |> as.data.frame()
to the pipe.
Regex walk-through.
.*club_info="([^"]+),([^"]+)".(.*)
^^ leading/trailing text, discarded
^^^^^^^^^^^ literal text
[^"]+ [^"]+ one or more "any character except dquote"
( ),( ) two capture-groups
Also, since we know that we'll have double quotes in the pattern and not single-quotes, I chose to use single-quotes as the outer string-defining demarcation. If we have both or if you want to avoid double-backslashes and the like, we can use R's "raw strings" instead,
r"{.*club_info="([^"]+),([^"]+)".(.*)}"
where the r"{
and }"
are the open/close delimiters; I chose braces here since parens are visually confusing with the regex-parens, though brackets r"[
/]"
and parens r"(
/)"
also work.