Using Megaparsec, if I want to parse a string containing comments of the form ~{content}
into a Comment
record, how would I go about doing that? For instance:
data Comment = { id :: Integer, content :: String }
parse :: Parser [Comment]
parse = _
parse
"hello world ~{1-sometext} bla bla ~{2-another comment}"
== [Comment { id = 1, content = "sometext" }, Comment { id = 2, content = "another comment"}]
The thing I'm stuck on is allowing for everything that's not ~{}
to be ignored, including the lone char ~
and the lone brackets {}
.
You can do this by dropping characters up to the next tilde, then parsing the tilde optionally followed by a valid comment, and looping.
In particular, if we define nonTildes
to discard non-tildes:
nonTildes :: Parser String
nonTildes = takeWhileP (Just "non-tilde") (/= '~')
and then an optionalComment
to parse a tilde and optional following comment in braces:
optionalComment :: Parser (Maybe Comment)
optionalComment = char '~' *>
optional (braces (Comment <$> ident_ <* char '-' <*> content_))
where
braces = between (char '{') (char '}')
ident_ = read <$> takeWhile1P (Just "digit") isDigit
content_ = takeWhileP Nothing (/= '}')
Then the comments can be parsed with:
comments :: Parser [Comment]
comments = catMaybes <$> (nonTildes *> many (optionalComment <* nonTildes))
This assumes that a ~{
without a matching }
is a parse error, rather than valid non-comment text, which seems sensible. However, the definition of the content_
parser is probably too liberal. It gobbles everything up to the next }
, meaning that:
"~{1-{{{\n}"
is a valid comment with content "{{{\n"
. Disallowing {
(and maybe ~
) in comments, or alternatively requiring braces to be properly nested in comments seems like a good idea.
Anyway, here's a full code example for you to fiddle with:
{-# OPTIONS_GHC -Wall #-}
import Data.Char
import Data.Maybe
import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
type Parser = Parsec Void String
data Comment = Comment { ident :: Integer, content :: String } deriving (Show)
nonTildes :: Parser String
nonTildes = takeWhileP (Just "non-tilde") (/= '~')
optionalComment :: Parser (Maybe Comment)
optionalComment = char '~' *>
optional (braces (Comment <$> ident_ <* char '-' <*> content_))
where
braces = between (char '{') (char '}')
ident_ = read <$> takeWhile1P (Just "digit") isDigit
content_ = takeWhileP Nothing (/= '}')
comments :: Parser [Comment]
comments = catMaybes <$> (nonTildes *> many (optionalComment <* nonTildes))
main :: IO ()
main = do
parseTest comments "hello world ~{1-sometext} bla bla ~{2-another comment}"
parseTest comments "~~~ ~~~{1-sometext} {junk}"