I would like to parse a predicate such as: "3 > 2"
or "MyVar = 0"
.
Ideally, I would use a small GADT to represent the predicate:
data Expr a where
I :: Int -> Expr Int
B :: Bool -> Expr Bool
Var :: String -> Expr Int
Add :: Expr Int -> Expr Int -> Expr Int
Eq :: Eq a => Expr a -> Expr a -> Expr Bool
Mt :: Eq a => Expr a -> Expr a -> Expr Bool
The expression 3 > 2
would parse as Mt (I 3) (I 2)
.
I tried to approach the problem with Parsec.
However, the module Text.Parsec.Expr deals only with expressions, with type a -> a -> a
.
Any suggestions?
Parsing directly into a GADT is actually kind of tricky. In my experience, it's usually better to parse into an untyped ADT first (where the a -> a -> a
types are a natural fit), and then separately "type check" the ADT by transforming it into the desired GADT. The main disadvantage is that you have to define two parallel types for the untyped and typed abstract syntax trees. (You can technically get around this with some type level tricks, but it's not worth it for a small language.) However, the resulting design is easier to work with and generally more flexible.
In other words, I'd suggest using Parsec to parse into the untyped ADT:
data UExpr where
UI :: Int -> UExpr
UB :: Bool -> UExpr
UVar :: String -> UExpr
UAdd :: UExpr -> UExpr -> UExpr
UEq :: UExpr -> UExpr -> UExpr
UMt :: UExpr -> UExpr -> UExpr
and then write a type checker:
tc :: UExpr -> Expr a
Actually, you won't be able to write a tc
like that. Instead you'll need to break it up into mutually recursive type checkers for different expression types:
tc_bool :: UExpr -> Expr Bool
tc_int :: UExpr -> Expr Int
and you'll probably want to run them in a Reader
monad that provides a list of valid variables. (Type checking normally involves checking the types of variabels. In your case, you only have integer variables, but it still makes sense to ensure the variables are defined at the type checking stage.)
If you get stuck, a full solution follows...
SPOILERS
.
.
.
As I say, I'd first write a Parsec parser for an untyped UExpr
ADT. Note that the Text.Parsec.Expr machinery works fine for UExpr -> UExpr -> UExpr
operators:
{-# LANGUAGE GADTs #-}
{-# LANGUAGE StandaloneDeriving #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE FlexibleContexts #-}
{-# OPTIONS -Wall -Wno-missing-signatures #-}
import Text.Parsec
import Text.Parsec.Expr
import Text.Parsec.String
import Text.Parsec.Language
import Control.Monad.Reader
import Control.Exception
import Data.Maybe (fromJust)
import qualified Text.Parsec.Token as P
lexer = P.makeTokenParser haskellDef { P.reservedNames = ["true","false"] }
identifier = P.identifier lexer
integer = P.integer lexer
parens = P.parens lexer
reserved = P.reserved lexer
reservedOp = P.reservedOp lexer
symbol = P.symbol lexer
data UExpr where
UI :: Int -> UExpr
UB :: Bool -> UExpr
UVar :: String -> UExpr
UAdd :: UExpr -> UExpr -> UExpr
UEq :: UExpr -> UExpr -> UExpr
UMt :: UExpr -> UExpr -> UExpr
deriving (Show)
expr :: Parser UExpr
expr = buildExpressionParser
[ [Infix (UAdd <$ reservedOp "+") AssocLeft]
, [Infix (UEq <$ reservedOp "=") AssocNone, Infix (UMt <$ reservedOp ">") AssocNone]
] term
term :: Parser UExpr
term = parens expr
<|> UI . fromIntegral <$> integer
<|> UB True <$ reserved "true"
<|> UB False <$ reserved "false"
<|> UVar <$> identifier
test_parser :: IO ()
test_parser = do
parseTest expr "3 > 2"
parseTest expr "MyVar = 0"
Then, I'd write a type checker, probably something like the following. Note that for type checking, we only need to verify that the variable names exist; we don't need their values. However, I've used a single Ctx
type for both type checking and evaluation.
-- variable context (i.e., variable name/value pairs)
type Ctx = [(String, Int)]
data Expr a where
I :: Int -> Expr Int
B :: Bool -> Expr Bool
Var :: String -> Expr Int
Add :: Expr Int -> Expr Int -> Expr Int
Eq :: (Show (Expr a), Eq a) => Expr a -> Expr a -> Expr Bool
Mt :: (Show (Expr a), Ord a) => Expr a -> Expr a -> Expr Bool
deriving instance Show (Expr Bool)
deriving instance Show (Expr Int)
tc_bool :: UExpr -> Reader Ctx (Expr Bool)
tc_bool (UB b) = pure $ B b
tc_bool (UEq x y) = Eq <$> tc_int x <*> tc_int y
tc_bool (UMt x y) = Mt <$> tc_int x <*> tc_int y
tc_bool _ = error "type error: expecting a boolean expression"
tc_int :: UExpr -> Reader Ctx (Expr Int)
tc_int (UI n) = pure $ I n
tc_int (UVar sym)
= do mval <- asks (lookup sym)
case mval of Just _ -> pure $ Var sym
_ -> error "type error: undefined variables"
tc_int (UAdd x y) = Add <$> tc_int x <*> tc_int y
tc_int _ = error "type error: expecting an integer expression"
test_tc :: IO ()
test_tc = do
print $ run_tc_bool (UMt (UI 3) (UI 2))
print $ run_tc_bool (UEq (UVar "MyVar") (UI 0))
-- now some type errors
handle showError $ print $ run_tc_bool (UMt (UB False) (UI 2))
handle showError $ print $ run_tc_bool (UAdd (UEq (UI 1) (UI 1)) (UI 1))
where showError :: ErrorCall -> IO ()
showError e = print e
run_tc_bool e = runReader (tc_bool e) [("MyVar", 42)]
You may be surprised to learn that the most natural way to write a type checker doesn't actually "use" the GADT. It could have been equally easily written using two separate types for boolean and integer expressions. You would have found the same thing if you'd actually tried to parse directly into the GADT. The parser code would need to be pretty cleanly divided between a parser for boolean expressions of type Parser (Expr Bool)
and a parser for integer expressions of type Parser (Expr Int)
, and there'd be no straightforward way to write a single Parser (Expr a)
.
Really, the advantage of the GADT representation only comes at the evaluation stage where you can write a simple, type-safe evaluator that triggers no "non-exhaustive pattern" warnings, like so:
eval :: Expr a -> Reader Ctx a
eval (I n) = pure n
eval (B b) = pure b
eval (Var sym) = fromJust <$> asks (lookup sym)
eval (Add x y) = (+) <$> eval x <*> eval y
eval (Eq x y) = (==) <$> eval x <*> eval y
eval (Mt x y) = (>) <$> eval x <*> eval y
test_eval :: IO ()
test_eval = do
print $ run_eval (Mt (I 3) (I 2))
print $ run_eval (Eq (Var "MyVar") (I 0))
where run_eval e = runReader (eval e) [("MyVar", 42)]