parsinghaskellarithmetic-expressionsleft-to-right

Defining a left-to-right parser for arithmetic expressions


I'm having a hard time defining a left-to-right arithmetic expression parser in Haskell. What I have done so far is defining a right-to-left parser, following the "G. Hutton, Programming in Haskell" book.

-- the aexpr integer parser
aexpr_int :: Parser Int
aexpr_int = do
              a1 <- aterm_int
              s <- sign -- return 1 for "+" and -1 for "-"
              a2 <- aexpr_int
              return (a1 + (s * a2))
            <|>
            aterm_int

-- the aterm integer parser
aterm_int :: Parser Int
aterm_int = do
              a1 <- aterm_int
              char '*'
              a2 <- afactor_int
              return (a1 * a2);
            <|>
            do
              a1 <- afactor_int
              char '/'
              a2 <- aterm_int
              return (div a1 a2)
            <|>
            afactor_int

-- afactor_int
afactor_int :: Parser Int
afactor_int = do
                token (char '(')
                e <- aexpr_int
                token (char ')')
                return e
              <|> 
              do
                s <- sign 
                ic <- aexpr_int 
                return (s * ic)
              <|>  
              token int_const   

So this parses 1 - 2 - 3 - 4 as 1 - (2 - (3 - 4)), but I want it to parse as ((1 - 2) - 3) - 4. How can I reach this result?


Solution

  • The trick here is to define an aexpr_int parser that first parses an aterm_int, and then recursively, using a helper function on an accumulating expression, checks for multiple additional occurrences of sign >> aterm_int, adding the additional terms to the accumulator. It might look something like this:

    -- the aexpr integer parser
    aexpr_int :: Parser Int
    aexpr_int = do
      a1 <- aterm_int
      go a1
      where go expr =
              do
                s <- sign
                a2 <- aterm_int
                go (expr + (s * a2))
              <|> return expr
    

    Together with the following:

    -- the aterm integer parser
    aterm_int :: Parser Int
    aterm_int = do
      a1 <- afactor_int
      go a1
      where go expr =
              do
                char '*'
                a2 <- afactor_int
                go (expr * a2)
              <|>
              do
                char '/'
                a2 <- afactor_int
                go (div expr a2)
              <|> return expr
    
    -- afactor_int
    afactor_int :: Parser Int
    afactor_int = do
                    token (char '(')
                    e <- aexpr_int
                    token (char ')')
                    return e
                  <|>
                  do
                    s <- sign
                    ic <- afactor_int
                    return (s * ic)
                  <|>
                  token int_const
    

    this seems to work correctly:

    > parseTest aexpr_int "1-2-3-4"
    -8
    

    Note that, if you're using a real parser library instead of trying to write your own parser code for learning purposes, you'll want to use the library's built-in expression parser or combinators with names like chainl or sepBy to accomplish this.