parsinghaskellparsec

Parse a predicate in Haskell


I would like to parse a predicate such as: "3 > 2" or "MyVar = 0".

Ideally, I would use a small GADT to represent the predicate:

data Expr a where
    I   :: Int  -> Expr Int
    B   :: Bool -> Expr Bool
    Var :: String -> Expr Int
    Add :: Expr Int -> Expr Int -> Expr Int
    Eq  :: Eq a => Expr a -> Expr a -> Expr Bool
    Mt  :: Eq a => Expr a -> Expr a -> Expr Bool

The expression 3 > 2 would parse as Mt (I 3) (I 2).

I tried to approach the problem with Parsec. However, the module Text.Parsec.Expr deals only with expressions, with type a -> a -> a. Any suggestions?


Solution

  • Parsing directly into a GADT is actually kind of tricky. In my experience, it's usually better to parse into an untyped ADT first (where the a -> a -> a types are a natural fit), and then separately "type check" the ADT by transforming it into the desired GADT. The main disadvantage is that you have to define two parallel types for the untyped and typed abstract syntax trees. (You can technically get around this with some type level tricks, but it's not worth it for a small language.) However, the resulting design is easier to work with and generally more flexible.

    In other words, I'd suggest using Parsec to parse into the untyped ADT:

    data UExpr where
      UI :: Int -> UExpr
      UB :: Bool -> UExpr
      UVar :: String -> UExpr
      UAdd :: UExpr -> UExpr -> UExpr
      UEq :: UExpr -> UExpr -> UExpr
      UMt :: UExpr -> UExpr -> UExpr
    

    and then write a type checker:

    tc :: UExpr -> Expr a
    

    Actually, you won't be able to write a tc like that. Instead you'll need to break it up into mutually recursive type checkers for different expression types:

    tc_bool :: UExpr -> Expr Bool
    tc_int :: UExpr -> Expr Int
    

    and you'll probably want to run them in a Reader monad that provides a list of valid variables. (Type checking normally involves checking the types of variabels. In your case, you only have integer variables, but it still makes sense to ensure the variables are defined at the type checking stage.)

    If you get stuck, a full solution follows...

    SPOILERS

    .

    .

    .

    As I say, I'd first write a Parsec parser for an untyped UExpr ADT. Note that the Text.Parsec.Expr machinery works fine for UExpr -> UExpr -> UExpr operators:

    {-# LANGUAGE GADTs #-}
    {-# LANGUAGE StandaloneDeriving #-}
    {-# LANGUAGE FlexibleInstances #-}
    {-# LANGUAGE FlexibleContexts #-}
    {-# OPTIONS -Wall -Wno-missing-signatures #-}
    
    import Text.Parsec
    import Text.Parsec.Expr
    import Text.Parsec.String
    import Text.Parsec.Language
    import Control.Monad.Reader
    import Control.Exception
    import Data.Maybe (fromJust)
    import qualified Text.Parsec.Token as P
    
    lexer = P.makeTokenParser haskellDef { P.reservedNames = ["true","false"] }
    
    identifier = P.identifier lexer
    integer = P.integer lexer
    parens = P.parens lexer
    reserved = P.reserved lexer
    reservedOp = P.reservedOp lexer
    symbol = P.symbol lexer
    
    data UExpr where
      UI :: Int -> UExpr
      UB :: Bool -> UExpr
      UVar :: String -> UExpr
      UAdd :: UExpr -> UExpr -> UExpr
      UEq :: UExpr -> UExpr -> UExpr
      UMt :: UExpr -> UExpr -> UExpr
      deriving (Show)
    
    expr :: Parser UExpr
    expr = buildExpressionParser
      [ [Infix (UAdd <$ reservedOp "+") AssocLeft]
      , [Infix (UEq <$ reservedOp "=") AssocNone, Infix (UMt <$ reservedOp ">") AssocNone]
      ] term
    
    term :: Parser UExpr
    term = parens expr
      <|> UI . fromIntegral <$> integer
      <|> UB True <$ reserved "true"
      <|> UB False <$ reserved "false"
      <|> UVar <$> identifier
    
    test_parser :: IO ()
    test_parser = do
      parseTest expr "3 > 2"
      parseTest expr "MyVar = 0"
    

    Then, I'd write a type checker, probably something like the following. Note that for type checking, we only need to verify that the variable names exist; we don't need their values. However, I've used a single Ctx type for both type checking and evaluation.

    -- variable context (i.e., variable name/value pairs)
    type Ctx = [(String, Int)]
    
    data Expr a where
      I   :: Int  -> Expr Int
      B   :: Bool -> Expr Bool
      Var :: String -> Expr Int
      Add :: Expr Int -> Expr Int -> Expr Int
      Eq  :: (Show (Expr a), Eq a) => Expr a -> Expr a -> Expr Bool
      Mt  :: (Show (Expr a), Ord a) => Expr a -> Expr a -> Expr Bool
    deriving instance Show (Expr Bool)
    deriving instance Show (Expr Int)
    
    tc_bool :: UExpr -> Reader Ctx (Expr Bool)
    tc_bool (UB b) = pure $ B b
    tc_bool (UEq x y) = Eq <$> tc_int x <*> tc_int y
    tc_bool (UMt x y) = Mt <$> tc_int x <*> tc_int y
    tc_bool _ = error "type error: expecting a boolean expression"
    
    tc_int :: UExpr -> Reader Ctx (Expr Int)
    tc_int (UI n) = pure $ I n
    tc_int (UVar sym)
      = do mval <- asks (lookup sym)
           case mval of Just _ -> pure $ Var sym
                        _      -> error "type error: undefined variables"
    tc_int (UAdd x y) = Add <$> tc_int x <*> tc_int y
    tc_int _ = error "type error: expecting an integer expression"
    
    test_tc :: IO ()
    test_tc = do
      print $ run_tc_bool (UMt (UI 3) (UI 2))
      print $ run_tc_bool (UEq (UVar "MyVar") (UI 0))
      -- now some type errors
      handle showError $ print $ run_tc_bool (UMt (UB False) (UI 2))
      handle showError $ print $ run_tc_bool (UAdd (UEq (UI 1) (UI 1)) (UI 1))
    
      where showError :: ErrorCall -> IO ()
            showError e = print e
    
            run_tc_bool e = runReader (tc_bool e) [("MyVar", 42)]
    

    You may be surprised to learn that the most natural way to write a type checker doesn't actually "use" the GADT. It could have been equally easily written using two separate types for boolean and integer expressions. You would have found the same thing if you'd actually tried to parse directly into the GADT. The parser code would need to be pretty cleanly divided between a parser for boolean expressions of type Parser (Expr Bool) and a parser for integer expressions of type Parser (Expr Int), and there'd be no straightforward way to write a single Parser (Expr a).

    Really, the advantage of the GADT representation only comes at the evaluation stage where you can write a simple, type-safe evaluator that triggers no "non-exhaustive pattern" warnings, like so:

    eval :: Expr a -> Reader Ctx a
    eval (I n) = pure n
    eval (B b) = pure b
    eval (Var sym) = fromJust <$> asks (lookup sym)
    eval (Add x y) = (+) <$> eval x <*> eval y
    eval (Eq x y) = (==) <$> eval x <*> eval y
    eval (Mt x y) = (>) <$> eval x <*> eval y
    
    test_eval :: IO ()
    test_eval = do
      print $ run_eval (Mt (I 3) (I 2))
      print $ run_eval (Eq (Var "MyVar") (I 0))
    
      where run_eval e = runReader (eval e) [("MyVar", 42)]