parsinghaskelloperatorsparsec

How do I parse Python-style chaining operators in Haskell using parsec?


In the project I'm currently working on, I construct the expression parser in parsec. The code is something like:

opTable :: [[Operator Parser Expr]]
  opTable =
    [ 
      -- ...
      [ InfixL $ binary (ArithOp . Exp) TokExp ]
    , [ InfixL $ binary (ArithOp . Mod) TokMod
      , InfixL $ binary (ArithOp . Mul) TokMul
      , InfixL $ binary (ArithOp . Div) TokDiv
      ]

    , [ InfixL $ binary (ArithOp . Add) TokAdd
      , InfixL $ binary (ArithOp . Sub) TokSub
      ]

    , [ InfixL $ binary (ArithOp . Max) TokMax
      , InfixL $ binary (ArithOp . Min) TokMin
      ]

      -- =
    , [ InfixL $ binary (ChainOp . EQ) TokEQ
      -- ~, <, <=, >, >=
      , InfixL $ binary (ChainOp . NEQ) TokNEQ
      , InfixL $ binary (ChainOp . NEQU) TokNEQU
      , InfixL $ binary (ChainOp . LT) TokLT
      , InfixL $ binary (ChainOp . LTE) TokLTE
      , InfixL $ binary (ChainOp . LTEU) TokLTEU
      , InfixL $ binary (ChainOp . GT) TokGT
      , InfixL $ binary (ChainOp . GTE) TokGTE
      , InfixL $ binary (ChainOp . GTEU) TokGTEU
      ]

      -- &&
    , [ InfixL $ binary (ArithOp . Conj) TokConj
      , InfixL $ binary (ArithOp . ConjU) TokConjU
      -- ||
      , InfixL $ binary (ArithOp . Disj) TokDisj
      , InfixL $ binary (ArithOp . DisjU) TokDisjU
      ]
      -- =>
    , [ InfixR $ binary (ArithOp . Implies) TokImpl
      , InfixR $ binary (ArithOp . ImpliesU) TokImplU
      -- <=>
      , InfixL $ binary (ChainOp . EQProp) TokEQProp
      , InfixL $ binary (ChainOp . EQPropU) TokEQPropU
      ]
    ]

Note that there are 2 kinds of operators, namely ArithOp and ChainOp. ArithOps are normal operators, and ChainOps are Python-style chaining operator, such as a <= b < c. There are many chaining operators. For example, there are TokLT (<) and TokEQProp (<=>) Unfortunately, the precedence of ArithOps and ChainOps are intermingled. Now say I want to parse expressions into this syntax tree:

data Expr
  = Lit Lit
  | Var Name
  | Op Op
  | App Expr Expr
  | Chain Chain
  -- ...

-- ...

data Chain = Pure Expr | Ch Chain Op Expr

Is it possible that we can use the expression parser to construct such syntax trees with chains? For instance, something like 1 <= 1 + 1 < 3 should become:

Ch (
  Ch (
    Lit 1
  )
  ( Op "<=")
  (
    App (
      App (Op "+") (Lit 1)
    )
    (Lit 1)
  )
)
(Op "<")
(Lit 3)

(I omitted some constructors for clarity.)

Is this kind of chaining operators parseable using the expression parser? I know I can construct the parser by hand, by rewriting it into layers of parsers. However, using the expression parser is definitely simpler.

I found some combinators like chainl. However, it doesn't seem possible to use it to form the Chain above. Tell me if I can use the expression parser above. If it's not, tell me it's not and I will construct it by hand.


Solution

  • Syntactically, there's no real difference between your chain and arithmetic operators. They are (left or right) associative operators with different levels of precedence, and a standard expression parser should have no trouble parsing them. All you need to do is make sure the appropriate AST is constructed, which you can do by using two variations for your binary function in the elements of the operator table:

    As a stripped down example, something like the following should work:

    opTable :: [[Operator Parser Expr]]
    opTable =
        [ [ InfixL $ arithOp Mul "*" ]
        , [ InfixL $ arithOp Add "+" ]
        , [ InfixL $ chainOp Eq "=" ]
        , [ InfixL $ arithOp Conj "&&" ]
        , [ InfixR $ arithOp Implies "=>" ]
        , [ InfixL $ chainOp EQProp "<=>" ]
        ]
      where arithOp op tok = makeArith <$ string tok
              where makeArith a b = App (App (Op op) a) b
            chainOp op tok = makeChain <$ string tok
              where makeChain a b = Chain $ Ch (asChain a) op b
                    asChain (Chain c) = c
                    asChain e = Pure e
    

    The only tricky part here is correctly defining makeChain, which needs to handle both chains and non-chains as its first argument.

    A full, runnable example:

    import Text.Parsec
    import Text.Parsec.String
    import Control.Monad.Combinators.Expr
    
    type Lit = Int
    type Name = String
    
    data Expr
      = Lit Lit
      | Var Name
      | Op Op
      | App Expr Expr
      | Chain Chain
      deriving (Show)
    
    data Chain = Pure Expr | Ch Chain Op Expr
      deriving (Show)
    
    data Op = Mul | Add | Eq | Lte | Lt | Conj | Implies | EQProp
      deriving (Show)
    
    reservedOp :: String -> Parser String
    reservedOp s = try $ string s <* notFollowedBy (oneOf "*+=<>&")
    
    opTable :: [[Operator Parser Expr]]
    opTable =
        [ [ InfixL $ arithOp Mul "*" ]
        , [ InfixL $ arithOp Add "+" ]
        , [ InfixL $ chainOp Eq "="
          , InfixL $ chainOp Lte "<="
          , InfixL $ chainOp Lt "<"
          ]
        , [ InfixL $ arithOp Conj "&&" ]
        , [ InfixR $ arithOp Implies "=>" ]
        , [ InfixL $ chainOp EQProp "<=>" ]
        ]
      where arithOp op tok = makeArith <$ reservedOp tok
              where makeArith a b = App (App (Op op) a) b
            chainOp op tok = makeChain <$ reservedOp tok
              where makeChain a b = Chain $ Ch (asChain a) op b
                    asChain (Chain c) = c
                    asChain e = Pure e
    
    expr :: Parser Expr
    expr = makeExprParser term opTable
    
    term :: Parser Expr
    term = Lit . read <$> many1 digit
    
    main :: IO ()
    main = do
      parseTest expr "1<=1+1<3"
      -- output: Chain (Ch (Ch (Pure (Lit 1)) Lte (App (App (Op Add) (Lit 1)) (Lit 1))) Lt (Lit 3))