parsinghaskellparsecmegaparsec

Megaparsec: transforming comment syntax into a Record


Using Megaparsec, if I want to parse a string containing comments of the form ~{content} into a Comment record, how would I go about doing that? For instance:

data Comment = { id :: Integer, content :: String }

parse :: Parser [Comment]
parse = _

parse
  "hello world ~{1-sometext} bla bla ~{2-another comment}" 
  == [Comment { id = 1, content = "sometext" }, Comment { id = 2, content = "another comment"}]

The thing I'm stuck on is allowing for everything that's not ~{} to be ignored, including the lone char ~ and the lone brackets {}.


Solution

  • You can do this by dropping characters up to the next tilde, then parsing the tilde optionally followed by a valid comment, and looping.

    In particular, if we define nonTildes to discard non-tildes:

    nonTildes :: Parser String
    nonTildes = takeWhileP (Just "non-tilde") (/= '~')
    

    and then an optionalComment to parse a tilde and optional following comment in braces:

    optionalComment :: Parser (Maybe Comment)
    optionalComment = char '~' *>
      optional (braces (Comment <$> ident_ <* char '-' <*> content_))
      where
        braces = between (char '{') (char '}')
        ident_ = read <$> takeWhile1P (Just "digit") isDigit
        content_ = takeWhileP Nothing (/= '}')
    

    Then the comments can be parsed with:

    comments :: Parser [Comment]
    comments = catMaybes <$> (nonTildes *> many (optionalComment <* nonTildes))
    

    This assumes that a ~{ without a matching } is a parse error, rather than valid non-comment text, which seems sensible. However, the definition of the content_ parser is probably too liberal. It gobbles everything up to the next }, meaning that:

    "~{1-{{{\n}"
    

    is a valid comment with content "{{{\n". Disallowing { (and maybe ~) in comments, or alternatively requiring braces to be properly nested in comments seems like a good idea.

    Anyway, here's a full code example for you to fiddle with:

    {-# OPTIONS_GHC -Wall #-}
    
    import Data.Char
    import Data.Maybe
    import Data.Void
    import Text.Megaparsec
    import Text.Megaparsec.Char
    
    type Parser = Parsec Void String
    
    data Comment = Comment { ident :: Integer, content :: String } deriving (Show)
    
    nonTildes :: Parser String
    nonTildes = takeWhileP (Just "non-tilde") (/= '~')
    
    optionalComment :: Parser (Maybe Comment)
    optionalComment = char '~' *>
      optional (braces (Comment <$> ident_ <* char '-' <*> content_))
      where
        braces = between (char '{') (char '}')
        ident_ = read <$> takeWhile1P (Just "digit") isDigit
        content_ = takeWhileP Nothing (/= '}')
    
    comments :: Parser [Comment]
    comments = catMaybes <$> (nonTildes *> many (optionalComment <* nonTildes))
    
    main :: IO ()
    main = do
      parseTest comments "hello world ~{1-sometext} bla bla ~{2-another comment}"
      parseTest comments "~~~ ~~~{1-sometext} {junk}"