Edit (to generalize the problem):
I'd like to parse a grammar, where
<prefix> ::= [a-z]*
<middle> ::= xxx
<suffix> ::= b+
<grammar> ::= <prefix><middle><suffix>
I expect (for example) the following words to pass: aaaaxxxbb
, axxxaaxxxbbb
, xxxxxxbb
Original post:
I expected the following parser to backtrack and find a solution in the end:
val before = P(AnyChar.rep.!)
val content = P("xxx".!)
val after = P("b".rep.!)
val all = P(before ~ content ~ after ~ End)
def test() = {
val r = all.parse("aaaaxxxbbb")
println(r)
}
Instead it looks like the before
part greedily parses all the text, and the parser fails without backtracking.
Am I missing something?
I was able to solve the issue in the end.
It's reasonable, that the parser won't backtrack within the regex, so I thought I should rewrite the AnyChar.rep
part as a recursive rule, like this:
val before: P[Any] = P(AnyChar | (AnyChar ~ before))
But that wasn't enough, fastparse still don't seem to backtrack.
I stumbled upon this question about parsing ambiguous grammar. So I tried using GLL combinators instead of Fastparse, and that made it work.
object TestParser1 extends Parsers with RegexParsers {
lazy val before: Parser[String] = (".".r | (".".r ~ before)) ^^ {
case a ~ b => a.toString + b.toString
case x => x.toString
}
lazy val content: Parser[String] = "xxx"
lazy val after: Parser[String] = "b+".r
lazy val all: Parser[String] = before ~ content ~ after ^^ {
case (b, c, a) => s"$b $c $a"
}
def test() = {
val r = all("aaaaxxxbbb")
r.toList foreach println
}
}