haskellmegaparsec

Megaparsec: Unable to parse recursive arithmetic string


I'm working on a small parser using Megaparsec and trying to parse arithmetic.

-- Arithmetic expressions
data Aexp = N Num 
            | V Var 
            | Mult Aexp Aexp
            | Add Aexp Aexp 
            | Sub Aexp Aexp 
             deriving (Show, Eq, Read)


arithParser :: Parser Aexp
arithParser = V <$> strParser
            <|> N <$> numParser
            <|> Mult <$> arithParser <* tok "*" <*> arithParser
--boolParser :: Parser Bexp


strParser :: Parser Var
strParser = tok "\"" *> some (noneOf ("\n\r\"=[]{},:")) <* tok "\""

numParser :: Parser Num
numParser = (some (oneOf ['0' .. '9']) >>= return . read) <* whitespace

If I run the command Parse arithParser "5*5" "5*5" it just returns Right (N 5), where it should return Mult(N 5) (N 5). Because of the precedence in the arithParser. But if I change the order then it seems to go into an infinite loop and crash.

Not sure what I'm doing wrong here, any help would be appreciated.


Solution

  • Parsec tries the left alternative of <|> before it tries the right one. If the left alternative succeeds then it won't bother with the right one. So in this instance, when fed the input 5*5, Parsec's process looks like this:

    1. Try V <$> strParser. strParser begins with tok "\"", but the input string doesn't begin with '"' so strParser fails.
    2. Try N <$> numParser. numParser successfully parses the number 5, so Parsec just returns N 5.
    3. Done! No need to try the third alternative.

    So we can attempt to patch this parser up by moving the Mult option up to the top, wrapped in a try so that it can backtrack and try numParser or strParser if the input turns out not to be a multiplication.

    arithParser :: Parser Aexp
    arithParser = try (Mult <$> arithParser <* tok "*" <*> arithParser)
                <|> N <$> numParser
                <|> V <$> strParser
    

    This parser has another, more subtle problem. Let's walk through the steps, as above.

    1. Try try (Mult <$> arithParser <* tok "*" <*> arithParser). This parser begins with arithParser, so recursively call arithParser.
    2. Try try (Mult <$> arithParser <* tok "*" <*> arithParser). This parser begins with arithParser, so recursively call arithParser.
    3. Try try (Mult <$> arithParser <* tok "*" <*> arithParser). This parser begins with arithParser, so recursively call arithParser.
    4. ...

    It's an infinite loop. Parsec can't handle left-recursive grammars. You have to design your parser so that it consumes at least one token before a recursive call. One common way of doing this is to "flatten out" your grammar:

    expr, term :: Parser AExp
    expr = do
        n <- term
        rest <- optional $ tok "*" *> expr
        return $ maybe n (Mult n) rest
    term = N <$> numParser
        <|> V <$> strParser
        <|> parenthesised expr
    
    parenthesised = between (char '(') (char ')')
    

    Here I've split up the parser into one which parses an arbitrary expr - a term optionally followed by a multiplication symbol and a multiplicand expr - and one which parses single terms - numbers, strings, and parenthesised expressions. The recursive calls to expr are OK now - the one inside expr happens only after you've parsed a term (which always consumes input) and the one inside term happens only after you've parsed an opening parenthesis.

    Note that expr has a list-like structure: it parses a single thing possibly followed by many things. In general you should think of parsers consuming a linear input stream of input tokens, so it's not surprising that list-shaped parsers tend to be more effective than tree-shaped ones.

    The Control.Monad.Combinators.Expr module contains functions which package up this pattern and parse expressions with arbitrary precedence and fixity rules.

    expr = makeExprParser term [[InfixR $ tok "*" $> Mult]]