parsinghaskellparser-combinators

how to parse unary expressions using parser combinators?


I'm trying to parse expressions using my own handwritten parser combintor library in Haskell. already I can parse binary expressions with the code below, that I found in dhall project, but I do not understand it properly.

this is how I defined my expressions:

data Expression
  = BinaryOperation Expression Operator Expression
  | UnaryOperation Operator Expression
  deriving (Show, Eq)

and this is the function I took from dhall project. you can read more of the dhall code at here

makeOperator :: Parser Operator -> Parser Expression -> Parser Expression
makeOperator parseOperator parseValue = do
  l0 <- parseValue
  fs <- many do
    operator <- do
      _ <- wsP
      parseOperator
    _ <- wsP
    r <- parseValue
    pure $ \l -> BinaryOperation l operator r
  pure $ foldl (\l f -> f l) l0 fs

I tried replicating the dhall code, by writing the code below to parse unary expressions, but it seems this code consumes the input and doesn't let the parsers with lower precedence do their job. E.g. +2 + 2 will become +2 and + 2 will be just leftover string of the parser.

makeOperatorUnary :: Parser Operator -> Parser Expression -> Parser Expression
makeOperatorUnary parseOperator parseValue = do
  operator <- parseOperator
  value <- parseValue
  pure $ UnaryOperation operator value

I would be glad if someone did explain how the dhall code exactly worked and lead me into the right direction.

thank you


Solution

  • The way the makeOperator combinator works, you supply it with a parser parseOperator for the set of equal precedence, left associative binary operators you're interested in (e.g., + and - binary operators at the same precedence), as well as a parser parseValue for parsing expressions at a higher precedence level (e.g., multiplication and division), and it implements precedence by parsing each of the parseValue-precedence expressions as self-contained units that are joined by the parseOperator operators at the current precedence level.

    So, to parse binary + and - expressions, you want to use something like:

    parseAddSub :: Parser Expression
    parseAddSub = makeOperator parseBinaryPlusOrMinusOperator parseMulDiv
    

    where parseBinaryPlusOrMinusOperator takes care of parsing the + and - operators, and parseMulDiv is the next highest precedence expression parser that will ensure that, say, 1*2 and 3*4 are parsed as units, so when joined by + in the expression 1*2+3*4, the result will end up being equivalent to (1*2)+(3*4), correctly implementing the precedence.

    Your makeOperatorUnary combinator is similar. It takes a parser for the unary operator of interest, say -, and then it takes a parser for higher precedence expressions. That means that if you write:

    parseUnaryMinus = makeOperatorUnary parseUnaryMinusOperator parseAddSub
    

    then you're going to have a problem. If you're trying to parse -3+4, this version of parseUnaryMinus will parse the unary - operator and then parse the entire 3+4 expression as a higher precedence parseAddSub expression, which will give the equivalent of -(3+4), which is wrong precedence.

    You probably want to sneak unary minus in between parseAddSub and parseMulDiv, like this:

    parseAddSub = makeOperator parseBinaryPlusOrMinusOperator parseUnaryMinus
    
    parseUnaryMinus = makeOperatorUnary parseUnaryMinusOperator parseMulDiv
    
    parseMulDiv = ...
    

    This will accept something like 1+-3*4 as equivalent to 1+(-(3*4)), which is probably what you want.

    You also need makeOperatorUnary to optionally parse the operator, the same way parseAddSub checks for, but doesn't require, binary + and - operators. So, instead of your definition above, try:

    makeOperatorUnary parseOperator parseValue
      =   UnaryOperation <$> parseOperator <*> parseValue
      <|> parseValue
    

    Here's a simplified example without whitespace handling to illustrate the design:

    import Data.Void
    import Text.Megaparsec
    import Text.Megaparsec.Char
    
    type Parser = Parsec Void String
    
    data UnaryOperator = UnaryMinus
      deriving (Show)
    
    data Operator = Plus | Minus | Times | Divide
      deriving (Show)
    
    data Expression
      = Literal Int
      | BinaryOperation Expression Operator Expression
      | UnaryOperation UnaryOperator Expression
      deriving (Show)
    
    makeOperator :: Parser Operator -> Parser Expression -> Parser Expression
    makeOperator parseOperator parseValue = do
      l0 <- parseValue
      fs <- many $ do
        operator <- do
          parseOperator
        r <- parseValue
        pure $ \l -> BinaryOperation l operator r
      pure $ foldl (\l f -> f l) l0 fs
    
    makeUnaryOperator :: Parser UnaryOperator -> Parser Expression -> Parser Expression
    makeUnaryOperator parseOperator parseValue
      =   UnaryOperation <$> parseOperator <*> parseValue
      <|> parseValue
    
    
    parseAddSub, parseUnaryMinus, parseMulDiv, parseFactor :: Parser Expression
    
    parseAddSub = makeOperator (Plus <$ char '+' <|> Minus <$ char '-') parseUnaryMinus
    
    parseUnaryMinus = makeUnaryOperator (UnaryMinus <$ char '-') parseMulDiv
    
    parseMulDiv = makeOperator (Times <$ char '*' <|> Divide <$ char '/') parseFactor
    
    parseFactor = Literal . read <$> some digitChar
      <|> between (char '(') (char ')') parseAddSub
    
    main = do
      parseTest parseAddSub "1+-2*3"