I have the following string:
def str='prop1: value1, prop2: value2;value3, prop3:"test:1234, test1:23;45, test2:34;34", prop4: "test1:66;77, 888"'
what I want to end up with is the following list of pairs
prop1: value1
prop2: value2;value3
prop3: test:1234, test1:23;45, test4:34;34
prop4: test, 66;77, 888
I figure if I can first parse and strip out props3 and 4, then I can simply split on comma for the rest of the string. but having a problem with being able to get a match for prop 4
The following is the code and regex I have tried so far. Commented out in the code are various regex I have tried but have not been able to extract the last prop4
def str='prop1: value1, prop2: value2;value3, prop3:"test:1234, test1:23;45, test4:34;34", prop4: "test, 66;77, 888"'
//def regex = /(\w+):"(.*)"[,\s$]/
//def regex = /(\w+):"(.*)"[,|\s|$]/
def regex = /(\w+):"(.*)"[,\s]|$/
def m = (str =~ regex)
(0..<m.count).each{
println("${m[it][1]}=${m[it][2]}")
}
This returns:
prop3=test:1234, test1:23;45, test2:34;34
null=null
What am I missing here?
(Also, is there a way to parse all this with just a single regex pass as opposed to my approach described above..regex first, then split?)
Basee on your give example data, following regex would work:
\b(\w+):\s*(\"[^\"]*\"|[^,\"]*)
RegEx Demo:
\b
: Word boundary(\w+)
: Capture group #1 t match 1+ word characters:
: Match a :
\s*
: 0 or more whitespaces(
: Start capture group #2
\"[^\"]*\"
: Match a quoted text|
: OR[^,\"]*
: Match 0 or more of any char that is not ,
and "
)
: End capture group #2