scalafastparse

Fastparse runs out of memory for simple rules


I wrote following parser using fastparse library to get words in pairs. On execution it parses using parsePair properly. But for parseMultiPairs it goes into 100% CPU usage and then exitsx with OutOfMemoryError.

import fastparse.*, MultiLineWhitespace.*

case class Pair(a: String, b: String)

def parsePair[$: P]: P[Pair] = P(CharIn("a-z").repX.! ~ CharIn("a-z").repX.!).map(Pair(_, _))

def parsePairs[$: P]: P[Seq[Pair]] = P(parsePair.rep ~ End)

println(parse("gnu linux", parsePair(_)))
println(parse("gnu linux", parsePairs(_)))

I thought maybe I am not writing the grammar correctly. So I wrote similar grammar in parsy library of Python as:

from parsy import *

word = regex(r'[a-z]+')
pair = word.sep_by(regex(r' +'), min=2, max=2)
pairs = pair.sep_by(regex(r'[ \n]+'))

print(word.parse('hello'))
print(pair.parse('hello world'))
print(pairs.parse('hello world hey there\ngood bye'))

It works as expected and is able to parse all inputs. What change do I need with fastparse to make it work?


Solution

  • Got a reply from Haoyi themself here on Discord. Issue with my fastparse spec was my definition of word as CharIn("a-z").repX which is 0 or more characters, instead of CharIn("a-z").repX(1) which is 1 or more characters.

    Following is the complete working code:

    object Test extends App:
        import fastparse.*, MultiLineWhitespace.*
    
        case class Pair(a: String, b: String)
    
        def parsePair[$: P]: P[Pair] = P(CharIn("a-z").repX(1).! ~ CharIn("a-z").repX(1).!).map(Pair(_, _))
    
        def parseMultiPairs[$: P]: P[Seq[Pair]] = P(parsePair.rep ~ End)
    
        println(parse("gnu linux", parsePair(_)))
        println(parse("hello world hey there\ngood bye", parseMultiPairs(_)))
    end Test