In a normal PEG (parsing expression grammar) this is a valid grammar:
values <- number (comma values)*
number <- [0-9]+
comma <- ','
However, if I try to write this using LPeg the recursive nature of that rule fails:
local lpeg = require'lpeg'
local comma = lpeg.P(',')
local number = lpeg.R('09')^1
local values = number * (comma * values)^-1
--> bad argument #2 to '?' (lpeg-pattern expected, got nil)
Although in this simple example I could rewrite the rule to not use recursion, I have some existing grammars that I'd prefer not to rewrite.
How can I write a self-referencing rule in LPeg?
Use a grammar.
With the use of Lua variables, it is possible to define patterns incrementally, with each new pattern using previously defined ones. However, this technique does not allow the definition of recursive patterns. For recursive patterns, we need real grammars.
LPeg represents grammars with tables, where each entry is a rule.
The call lpeg.V(v) creates a pattern that represents the nonterminal (or variable) with index v in a grammar. Because the grammar still does not exist when this function is evaluated, the result is an open reference to the respective rule.
A table is fixed when it is converted to a pattern (either by calling lpeg.P or by using it wherein a pattern is expected). Then every open reference created by lpeg.V(v) is corrected to refer to the rule indexed by v in the table.
When a table is fixed, the result is a pattern that matches its initial rule. The entry with index 1 in the table defines its initial rule. If that entry is a string, it is assumed to be the name of the initial rule. Otherwise, LPeg assumes that the entry 1 itself is the initial rule.
As an example, the following grammar matches strings of a's and b's that have the same number of a's and b's:
equalcount = lpeg.P{ "S"; -- initial rule name S = "a" * lpeg.V"B" + "b" * lpeg.V"A" + "", A = "a" * lpeg.V"S" + "b" * lpeg.V"A" * lpeg.V"A", B = "b" * lpeg.V"S" + "a" * lpeg.V"B" * lpeg.V"B", } * -1
It is equivalent to the following grammar in standard PEG notation:
S <- 'a' B / 'b' A / '' A <- 'a' S / 'b' A A B <- 'b' S / 'a' B B