I'm implementing an interpreter for a functional language that will need to do runtime type checking, as different expressions in the input language return different types of values. I started off with a basic boxing data type like this:
-- Extended value type
data Value
= StringValue T.Text
| IntValue Integer
| BoolValue Bool
| Error EvalError
deriving (Eq, Show)
This values would be returned from expressions in the parse tree on evaluation, something like this:
data Expression = Constant Value a |
UnaryOp (Value a -> Value b) Expression |
BinaryOp (Value a -> Value b -> Value c) Expression Expression
For defining operations, I created a helper function:
lift :: (Int -> Int -> Int) -> Value -> Value -> Value
lift f (IntValue x) (IntValue y) = IntValue $ f x y
lift _ _ _ = Error TypeError
Which could be used to build operators like this:
case op of
'+' -> (BinaryOp (lift (+)) arg1 arg2, rest2)
-- etc.
Implementation started to get a bit cumbersome, and I thought it was a little weird to define my own lift
method and all the other surrounding operators, when Haskell has type classes for exactly this reason. I thought about trying to make Value
an instance of Functor
and Applicative
, but that doesn't really work without Value
having a type variable, then I learned about GADTs, and I thought I could use that for my Value
type, so I changed it into this:
data Value a where
StringValue :: T.Text -> Value T.Text
IntValue :: Int -> Value Int
BoolValue :: Bool -> Value Bool
Error :: EvalError -> Value EvalError
Now I'm trying to make a Functor
and Applicative
instance. I started off with my Functor
instance looking like this:
instance Functor Value where
fmap f (StringValue t) = StringValue $ f t
fmap f (IntValue x) = IntValue $ f x
fmap f (BoolValue b) = BoolValue $ f b
fmap _ _ = Error TypeError
I thought it was a bit silly to have all of those repeated fmap
instances, and it would get really tricky to manage if I wanted to have combinations of types, like a function mapping an int to a string for example. I thought maybe I could get around this by using Applicative
and helper functions, so now I have this:
instance Functor Value where
fmap f v = pure $ f (extractValue v)
extractValue :: Value a -> a
extractValue (StringValue t) = t
extractValue (IntValue n) = n
extractValue (BoolValue b) = b
extractValue (Error e) = e
It seems a little silly to me to have all of these copies of unboxing code that are all doing the same thing, and my pure
implementation is going to also need a bunch of copies going the other way. I could just make Value
a generic constructor, but then I won't be able to pattern match on the value type later on, and I'm sure that's going raise issues like with the addition implementation above if the compiler can't be sure the value type actually is an int.
It would be nice if I could just pattern match on a generic kind of constructor, but I don't know if that's possible in Haskell. If I could just have all of the constructors match against something like
extractValue :: Value a -> a
extractValue (T t) = t
Instance Applicative Value where
pure x = T x
-- etc.
that would be great. Is there a way to do that in Haskell?
An easy way to avoid duplication is to factor out the "type" witnesses:
data Ty a where
StringTy :: Ty T.Text
IntTy :: Ty Int
BoolTy :: Ty Bool
ErrorTy :: Ty EvalError
data Value a = Value (Ty a) a
You can then use pattern synonyms to recover your original constructors:
pattern StringValue a = Value StringTy a
pattern IntValue a = Value IntTy a
pattern BoolValue a = Value BoolTy a
pattern Error a = Value ErrorTy a
The extractValue
function is now straightforward to implement:
extractValue :: Value a -> a
extractValue (Value _ a) = a
However, as pointed out by others in the comments, Value
is not a Functor
on all of Hask. The best you can do is define its action on endomorphisms:
emap :: (a -> a) -> Value a -> Value a
emap f (Value t a) = Value t (f a)
Or take an explicit witness that the target type is in Ty
:
fmap' :: Ty b -> (a -> b) -> Value a -> Value b
fmap' t' f (Value t a) = Value t' (f a)
Very informally, Value
is the inclusion of the full subcategory of Hask carved out by the types in Ty
, so you could reasonably see it as an "exofunctor".
Depending on your use case, it might make more sense to use the "identity functor" instead of Value
(so a
instead of Value a
) and constrain a
separately when needed, possibly using a typeclass instead of a GADT.