I'm working on a small parser using Megaparsec and trying to parse arithmetic.
-- Arithmetic expressions
data Aexp = N Num
| V Var
| Mult Aexp Aexp
| Add Aexp Aexp
| Sub Aexp Aexp
deriving (Show, Eq, Read)
arithParser :: Parser Aexp
arithParser = V <$> strParser
<|> N <$> numParser
<|> Mult <$> arithParser <* tok "*" <*> arithParser
--boolParser :: Parser Bexp
strParser :: Parser Var
strParser = tok "\"" *> some (noneOf ("\n\r\"=[]{},:")) <* tok "\""
numParser :: Parser Num
numParser = (some (oneOf ['0' .. '9']) >>= return . read) <* whitespace
If I run the command Parse arithParser "5*5" "5*5"
it just returns Right (N 5)
, where it should return Mult(N 5) (N 5)
. Because of the precedence in the arithParser. But if I change the order then it seems to go into an infinite loop and crash.
Not sure what I'm doing wrong here, any help would be appreciated.
Parsec tries the left alternative of <|>
before it tries the right one. If the left alternative succeeds then it won't bother with the right one. So in this instance, when fed the input 5*5
, Parsec's process looks like this:
V <$> strParser
. strParser
begins with tok "\""
, but the input string doesn't begin with '"'
so strParser
fails.N <$> numParser
. numParser
successfully parses the number 5, so Parsec just returns N 5
.So we can attempt to patch this parser up by moving the Mult
option up to the top, wrapped in a try
so that it can backtrack and try numParser
or strParser
if the input turns out not to be a multiplication.
arithParser :: Parser Aexp
arithParser = try (Mult <$> arithParser <* tok "*" <*> arithParser)
<|> N <$> numParser
<|> V <$> strParser
This parser has another, more subtle problem. Let's walk through the steps, as above.
try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins with arithParser
, so recursively call arithParser
.try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins with arithParser
, so recursively call arithParser
.try (Mult <$> arithParser <* tok "*" <*> arithParser)
. This parser begins with arithParser
, so recursively call arithParser
.It's an infinite loop. Parsec can't handle left-recursive grammars. You have to design your parser so that it consumes at least one token before a recursive call. One common way of doing this is to "flatten out" your grammar:
expr, term :: Parser AExp
expr = do
n <- term
rest <- optional $ tok "*" *> expr
return $ maybe n (Mult n) rest
term = N <$> numParser
<|> V <$> strParser
<|> parenthesised expr
parenthesised = between (char '(') (char ')')
Here I've split up the parser into one which parses an arbitrary expr
- a term
optionally followed by a multiplication symbol and a multiplicand expr
- and one which parses single term
s - numbers, strings, and parenthesised expressions. The recursive calls to expr
are OK now - the one inside expr
happens only after you've parsed a term
(which always consumes input) and the one inside term
happens only after you've parsed an opening parenthesis.
Note that expr
has a list-like structure: it parses a single thing possibly followed by many things. In general you should think of parsers consuming a linear input stream of input tokens, so it's not surprising that list-shaped parsers tend to be more effective than tree-shaped ones.
The Control.Monad.Combinators.Expr
module contains functions which package up this pattern and parse expressions with arbitrary precedence and fixity rules.
expr = makeExprParser term [[InfixR $ tok "*" $> Mult]]