I'm trying to parse iCalendar (RFC2445) input using a regex.
Here's a [simplified] example of what the input looks like:
BEGIN:VEVENT
abc:123
def:456
END:VEVENT
BEGIN:VEVENT
ghi:789
END:VEVENT
I'd like to get an array of matches: the "outer" match is each VEVENT block and the inner matches are each of the field:value pairs.
I've tried variants of this:
BEGIN:VEVENT\n((?<field>(?<name>\S+):\s*(?<value>\S+)\n)+?)END:VEVENT
But given the input above, the result seems to have only ONE field for each matching VEVENT, despite the +? on the capture group:
**Match 1**
field def:456
name def
value 456
**Match 2**
field ghi:789
name ghi
value 789
In the first match, I would have expected TWO fields: the abc:123 and the def:456 matches...
I'm sure this is a newbie mistake (since I seem to perpetually be a newbie when it comes to regex's...) - but maybe you can point me in the right direction?
Thanks!
You need to split your regex up into one matching a VEVENT
and one matching the name/value pairs. You can then use nested scan
to find all occurences, e. g.
str.scan(/BEGIN:VEVENT((?<vevent>.+?))END:VEVENT/m) do
$~[:vevent].scan(/(?<field>(?<name>\S+?):\s*(?<value>\S+?))/) do
p $~[:field], $~[:name], $~[:value]
end
end
where str
is your input. This outputs:
"abc:1"
"abc"
"1"
"def:4"
"def"
"4"
"ghi:7"
"ghi"
"7"
If you want to make the code more readable, i suggest you require 'english'
and replace $~
with $LAST_MATCH_INFO