I wrote following parser using fastparse
library to get words in pairs. On execution it parses using parsePair
properly. But for parseMultiPairs
it goes into 100% CPU usage and then exitsx with OutOfMemoryError
.
import fastparse.*, MultiLineWhitespace.*
case class Pair(a: String, b: String)
def parsePair[$: P]: P[Pair] = P(CharIn("a-z").repX.! ~ CharIn("a-z").repX.!).map(Pair(_, _))
def parsePairs[$: P]: P[Seq[Pair]] = P(parsePair.rep ~ End)
println(parse("gnu linux", parsePair(_)))
println(parse("gnu linux", parsePairs(_)))
I thought maybe I am not writing the grammar correctly. So I wrote similar grammar in parsy
library of Python as:
from parsy import *
word = regex(r'[a-z]+')
pair = word.sep_by(regex(r' +'), min=2, max=2)
pairs = pair.sep_by(regex(r'[ \n]+'))
print(word.parse('hello'))
print(pair.parse('hello world'))
print(pairs.parse('hello world hey there\ngood bye'))
It works as expected and is able to parse all inputs. What change do I need with fastparse
to make it work?
Got a reply from Haoyi themself here on Discord. Issue with my fastparse
spec was my definition of word as CharIn("a-z").repX
which is 0 or more characters, instead of CharIn("a-z").repX(1)
which is 1 or more characters.
Following is the complete working code:
object Test extends App:
import fastparse.*, MultiLineWhitespace.*
case class Pair(a: String, b: String)
def parsePair[$: P]: P[Pair] = P(CharIn("a-z").repX(1).! ~ CharIn("a-z").repX(1).!).map(Pair(_, _))
def parseMultiPairs[$: P]: P[Seq[Pair]] = P(parsePair.rep ~ End)
println(parse("gnu linux", parsePair(_)))
println(parse("hello world hey there\ngood bye", parseMultiPairs(_)))
end Test