I'm running the example from slide 15:
qr{
<data>
<rule: data> <[text]>+
<rule: text> .+
}xm;
When running against a multi-line text:
line_1
line_2
I get:
'text' => [ 'line-1',
'
line-2' ]
and so far I've not succeeded getting rid of the '\n' in front of the second line captured.
Running Regexp::Grammers 1.048 on top of Strawberry perl 5.26.1.
update / clarification Having (pre-maturely - sorry!) raised a bug against the module, Damian clarified as follows (reply slightly adapted to match above example):
A rule with whitespace within it matches any whitespace (including newlines) in the input at that point. So a rule like:
<rule: text> .+
is really equivalent to:
<rule: text><.ws>.+
meaning: match-but-don't-capture any leading whitespace, then match any-characters-except-newline.
If you want whitespace inside the rule to be ignored (as you seem to want here), then you need to declare the rule as a token instead. Tokens don't have the magical "whitespace-matches-whitespace" behaviour of rules. Hence you would write:
<token: line> .+
in which case you will also need to explicitly consume the newlines separating each line, with something like:
<rule: data> <[line]>+ % \n
This works:
qr{
<data>
<rule: data> <[text]>+ % [\r\n]+
<rule: text> .+
}xm;
The lines of data are meant to be separated by EOL character(s) which the
[\r\n]+
specifies. Note: some Windows files end each line with both a new line \n
and a line feed \r
character hence the [\r\n]+
pattern. You can read more about this by doing a perldoc Regexp::Grammars
and searching for separator