I'm still pretty new to PEG.js, and I'm guessing this is just a beginner misunderstanding.
In trying to parse something like this:
definitions
some text
if
some additonal text
to parse here
then
still more text will
go here
I can get a grammar to properly read the three section (to be further parsed later, of course.) But it generates that text in an odd format. For instance, in the above, "some text" turns into
[
[undefined, "s"], [undefined, "o"], [undefined, "m"], [undefined, "e"], [undefined, " "],
[undefined, "t"], [undefined, "e"], [undefined, "x"], [undefined, "t"]
]
I can easily enough convert this to a plain string, but I'm wondering what I'm doing to give it that awful format. This is my grammar so far:
{
const combine = (xs) => xs .map (x => x[1]) .join('')
}
MainObject
= _ defs:DefSection _ condition:CondSection _ consequent: ConsequentSection
{return {defs, condition, consequent}}
DefSection = _ "definitions"i _ defs:(!"\nif" .)+
{return defs}
CondSection = _ "if"i _ cond:(!"\nthen" .)+
{return combine (cond)}
ConsequentSection = _ "then"i _ cons:.*
{return cons .join ('')}
_ "whitespace"
= [ \t\n\r]*
I can fix it by replacing {return defs}
with {return combine(defs)}
as in the other sections.
My main question is simply why does it generate that output? And is there a simpler way to fix it?
Overall, as I'm still pretty new to PEG.js, and I would love to know if there is a better way to write this grammar. Expressions like (!"\nif" .*)
seem fairly sketchy.
!Rule
, will always return undefined, will fail if the Rule
match..
will always match a single character.Rule1 Rule2 ...
will create a list with the results of each ruleRule+
or Rule*
will match Rule
as many times as possible and create a list. (+
fails if the first attempt to match rule fails)Your results are
[ // Start (!"\nif" .)
[undefined // First "\nif",
"s" // First .
] // first ("\nif" .)
,
[undefined, "o"] // Second (!"\nif" .)
, [undefined, "m"], [undefined, "e"], [undefined, " "],
[undefined, "t"], [undefined, "e"], [undefined, "x"], [undefined, "t"]
] // This list is (!"\nif" .)*, all the matches of ("\nif" .)
What you seem to want is to read the text instead, and you can use the operator $Rule
for this, it will return the input instead of the produced output.
MainObject
= _ defs:DefSection _ condition:CondSection _ consequent: ConsequentSection
{return {defs, condition, consequent}}
DefSection = _ "definitions"i _ defs:$(!"\nif" .)+
{return defs.trim()}
CondSection = _ "if"i _ cond:$(!"\nthen" .)+
{return cond.trim()}
ConsequentSection = _ "then"i _ cons:$(.*)
{return cons.trim()}
_ "whitespace"
= [ \t\n\r]*
Will produce
{
"defs": "some text",
"condition": "some additonal text
to parse here",
"consequent": "still more text will
go here"
}