I'm a beginner with Megaparsec and Haskell in general, and trying to write a parser for the following grammar:
A word will always be one of:
- A number composed of one or more ASCII digits (ie "0" or "1234") OR
- A simple word composed of one or more ASCII letters (ie "a" or "they") OR
- A contraction of two simple words joined by a single apostrophe (ie "it's" or "they're")
So far, I've got the following (this can probably be simplified):
data Word = Number String | SimpleWord String | Contraction String deriving (Show)
word :: Parser MyParser.Word
word = M.choice
[ Number <$> number
, Contraction <$> contraction
, SimpleWord <$> simpleWord
number :: Parser String
number = M.some C.numberChar
simpleWord :: Parser String
simpleWord = M.some C.letterChar
contraction :: Parser String
contraction = do
left <- simpleWord
void $ C.char '\''
right <- simpleWord
return (left ++ "'" ++ right)
But I'm having problem with defining a parser for skipping white spaces and anything that is non-alphanumeric. For example, given the input 'abc'
, the parser should discard the apostrophes and just take the "simple word".
The following doesn't compile:
filler :: Parser Char
filler = M.some (C.spaceChar A.<|> not C.alphaNumChar)
spaceConsumer :: Parser ()
spaceConsumer = L.space filler A.empty A.empty
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer
Here is the complete working code that I came up with.
type Parser =
-- The type for custom error messages. We have none, so use `Void`.
-- The input stream type. Let's use `String` for now.
data Word = Number String | SimpleWord String | Contraction String deriving (Eq)
instance Show WordCount.Word where
show (Number x) = x
show (SimpleWord x) = x
show (Contraction x) = x
words :: String -> Either String [String]
-- Force parser to consume entire input
-- <* Sequence actions, discarding the value of the second argument.
words input = case M.parse (M.some WordCount.word A.<* M.eof) "" input of
-- :t err = M.ParseErrorBundle String Void
Left err ->
let e = M.errorBundlePretty err
_ = putStr e
in Left e
Right (x) -> Right $ map (show) x
word :: Parser WordCount.Word
word =
M.skipManyTill filler $
lexeme $
-- <$> is infix for 'fmap'
[ Number <$> number,
Contraction <$> M.try contraction,
SimpleWord <$> simpleWord
number :: Parser String
number = M.some MC.numberChar
simpleWord :: Parser String
simpleWord = M.some MC.letterChar
contraction :: Parser String
contraction = do
left <- simpleWord
void $ MC.char '\''
right <- simpleWord
return $ left ++ "'" ++ right
-- Define separator characters
isSep :: Char -> Bool
isSep x = C.isSpace x || (not . C.isAlphaNum) x
-- Fillers fill the space between tokens
filler :: Parser ()
filler = void $ M.some $ M.satisfy isSep
-- 3rd and 4th arguments are for ignoring comments
spaceConsumer :: Parser ()
spaceConsumer = L.space filler A.empty A.empty
-- A parser that discards trailing space
lexeme :: Parser a -> Parser a
lexeme = L.lexeme spaceConsumer