I'm trying to parse expressions using my own handwritten parser combintor library in Haskell. already I can parse binary expressions with the code below, that I found in dhall project, but I do not understand it properly.
this is how I defined my expressions:
data Expression
= BinaryOperation Expression Operator Expression
| UnaryOperation Operator Expression
deriving (Show, Eq)
and this is the function I took from dhall project. you can read more of the dhall code at here
makeOperator :: Parser Operator -> Parser Expression -> Parser Expression
makeOperator parseOperator parseValue = do
l0 <- parseValue
fs <- many do
operator <- do
_ <- wsP
parseOperator
_ <- wsP
r <- parseValue
pure $ \l -> BinaryOperation l operator r
pure $ foldl (\l f -> f l) l0 fs
I tried replicating the dhall code, by writing the code below to parse unary expressions, but it seems this code consumes the input and doesn't let the parsers with lower precedence do their job. E.g. +2 + 2
will become +2
and + 2
will be just leftover string of the parser.
makeOperatorUnary :: Parser Operator -> Parser Expression -> Parser Expression
makeOperatorUnary parseOperator parseValue = do
operator <- parseOperator
value <- parseValue
pure $ UnaryOperation operator value
I would be glad if someone did explain how the dhall code exactly worked and lead me into the right direction.
thank you
The way the makeOperator
combinator works, you supply it with a parser parseOperator
for the set of equal precedence, left associative binary operators you're interested in (e.g., +
and -
binary operators at the same precedence), as well as a parser parseValue
for parsing expressions at a higher precedence level (e.g., multiplication and division), and it implements precedence by parsing each of the parseValue
-precedence expressions as self-contained units that are joined by the parseOperator
operators at the current precedence level.
So, to parse binary +
and -
expressions, you want to use something like:
parseAddSub :: Parser Expression
parseAddSub = makeOperator parseBinaryPlusOrMinusOperator parseMulDiv
where parseBinaryPlusOrMinusOperator
takes care of parsing the +
and -
operators, and parseMulDiv
is the next highest precedence expression parser that will ensure that, say, 1*2
and 3*4
are parsed as units, so when joined by +
in the expression 1*2+3*4
, the result will end up being equivalent to (1*2)+(3*4)
, correctly implementing the precedence.
Your makeOperatorUnary
combinator is similar. It takes a parser for the unary operator of interest, say -
, and then it takes a parser for higher precedence expressions. That means that if you write:
parseUnaryMinus = makeOperatorUnary parseUnaryMinusOperator parseAddSub
then you're going to have a problem. If you're trying to parse -3+4
, this version of parseUnaryMinus
will parse the unary -
operator and then parse the entire 3+4
expression as a higher precedence parseAddSub
expression, which will give the equivalent of -(3+4)
, which is wrong precedence.
You probably want to sneak unary minus in between parseAddSub
and parseMulDiv
, like this:
parseAddSub = makeOperator parseBinaryPlusOrMinusOperator parseUnaryMinus
parseUnaryMinus = makeOperatorUnary parseUnaryMinusOperator parseMulDiv
parseMulDiv = ...
This will accept something like 1+-3*4
as equivalent to 1+(-(3*4))
, which is probably what you want.
You also need makeOperatorUnary
to optionally parse the operator, the same way parseAddSub
checks for, but doesn't require, binary +
and -
operators. So, instead of your definition above, try:
makeOperatorUnary parseOperator parseValue
= UnaryOperation <$> parseOperator <*> parseValue
<|> parseValue
Here's a simplified example without whitespace handling to illustrate the design:
import Data.Void
import Text.Megaparsec
import Text.Megaparsec.Char
type Parser = Parsec Void String
data UnaryOperator = UnaryMinus
deriving (Show)
data Operator = Plus | Minus | Times | Divide
deriving (Show)
data Expression
= Literal Int
| BinaryOperation Expression Operator Expression
| UnaryOperation UnaryOperator Expression
deriving (Show)
makeOperator :: Parser Operator -> Parser Expression -> Parser Expression
makeOperator parseOperator parseValue = do
l0 <- parseValue
fs <- many $ do
operator <- do
parseOperator
r <- parseValue
pure $ \l -> BinaryOperation l operator r
pure $ foldl (\l f -> f l) l0 fs
makeUnaryOperator :: Parser UnaryOperator -> Parser Expression -> Parser Expression
makeUnaryOperator parseOperator parseValue
= UnaryOperation <$> parseOperator <*> parseValue
<|> parseValue
parseAddSub, parseUnaryMinus, parseMulDiv, parseFactor :: Parser Expression
parseAddSub = makeOperator (Plus <$ char '+' <|> Minus <$ char '-') parseUnaryMinus
parseUnaryMinus = makeUnaryOperator (UnaryMinus <$ char '-') parseMulDiv
parseMulDiv = makeOperator (Times <$ char '*' <|> Divide <$ char '/') parseFactor
parseFactor = Literal . read <$> some digitChar
<|> between (char '(') (char ')') parseAddSub
main = do
parseTest parseAddSub "1+-2*3"