haskellmegaparsec

How to parse multiple lines with megaparsec? many . many runs into space leak


I'd like to parse some very simple text for example, "abcxyzzzz\nhello\n" into ["abcxyzzz", "hello"] :: String.

Not looking for a simpler function to do this (like words) as I need to parse something more complex and I'm just laying the foundations here.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TypeSynonymInstances #-}
{-# LANGUAGE FlexibleInstances #-}

module RgParse where

import Data.Text (Text)
import Text.Megaparsec
import Text.Megaparsec.Char


data SimpleData = SimpleData String deriving (Eq, Show, Ord)
data SimpleData' = SimpleData' [String] deriving (Eq, Show, Ord)

instance ShowErrorComponent SimpleData where
  showErrorComponent = show

instance ShowErrorComponent String where
  showErrorComponent = show



simple :: Parsec String Text SimpleData
simple = do
  x <- many (noneOf (Just '\n'))
  pure $ SimpleData x

simple' :: Parsec String Text SimpleData'
simple' = do
  x <- many (many (noneOf (Just '\n')))
  pure $ SimpleData' x

example2 :: Text
example2 = "abcxyzzzz\nhello\n"

main :: IO ()
main = do
  print "Simple:"
  case parse simple "<stdin>" example2 of
    Left bundle -> putStr (errorBundlePretty bundle)
    Right result -> print result
  print "Simple':"
  case parse simple' "<stdin>" example2 of
    Left bundle -> putStr (errorBundlePretty bundle)
    Right result -> print result
  print "done.."

The above unfortunately runs into an infinite loop / space leak upon entering simple' as it outputs the following:

Hello, Haskell!
[]
"Simple:"
SimpleData "abcxyzzzz"
"Simple':"

Using megaparsec-7.0.5 (not the latest 9.x.x).

Is there possibly a simpler approach to getting multiple lines?


Solution

  • Apply many only to a parser that either consumes at least one token (here, one Char) or fails. That's because many works by running its argument until it fails. many x may consume zero tokens, so many (many x) breaks this requirement.

    Note that a line should at least involve a terminating newline. That allows that requirement to be fulfilled.

    oneline :: Parsec String Text String
    oneline = many (noneOf (Just '\n')) <* single '\n'
    
    manylines :: Parsec String Text [String]
    manylines = many oneline
    
    simple :: Parsec String Text SimpleData
    simple = do
      x <- oneline
      pure $ SimpleData x
    
    simple' :: Parsec String Text SimpleData'
    simple' = do
      x <- manylines
      pure $ SimpleData' x
    

    A looser requirement for many p is that any repetition of p must fail after a finite number of iterations (and here p = many x never fails), so p might consume nothing in some steps, but then it must be stateful so that after some repetitions it eventually consumes something or fails. But the above approximation is a pretty good rule of thumb in practice.