General Question
I have a pair of datatypes that are two different ways of representing the same thing, one records the variable name in String, while the other one records the variable name in Int. Currently, they're both defined. However, I would like to describe just first representation, and the second one would be generated by some relation.
Specific Example
Concretely, the first one is the user defined version of a STLC term universe, while the second one is the de Bruijn–indexed version of the same thing:
{-# LANGUAGE RankNTypes, GADTs, DataKinds, TypeOperators, KindSignatures, PolyKinds, TypeFamilies, UndecidableInstances, FunctionalDependencies, FlexibleInstances #-}
-- * Universe of Terms * --
data Term :: Type -> * where
Var :: Id -> Term a
Lam :: Id -> Type -> Term b -> Term (a :-> b)
App :: Term (a :-> b) -> Term a -> Term b
Let :: Id -> Term a -> Term b -> Term b
Tup :: Term a -> Term b -> Term (a :*: b)
Lft :: Term a -> Term (a :+: b)
Rgt :: Term b -> Term (a :+: b)
Tru :: Term Boolean
Fls :: Term Boolean
Uni :: Term Unit
data TermIn :: Type -> * where
VarI :: Idx -> Info -> TermIn a
LamI :: Type -> TermIn b -> Info -> TermIn (a :-> b)
AppI :: TermIn (a :-> b) -> TermIn a -> TermIn b
LetI :: TermIn a -> TermIn b -> Info -> TermIn b
TupI :: TermIn a -> TermIn b -> TermIn (a :*: b)
TruI :: TermIn Boolean
FlsI :: TermIn Boolean
UniI :: TermIn Unit
-- * Universe of Types * --
data Type
= Type :-> Type
| Type :*: Type
| Type :+: Type
| Boolean
| Unit
| Void
-- * Synonyms * --
type Id = String -- * variable name
type Idx = Int -- * de-brujin's index
data Info = Rec Id String -- * store original name and extra info
There is already a relationship defined over Term
and TermIn
:
class DeBruijnPair a b | a -> b, b -> a where
decode :: b -> a
encode :: a -> b
instance DeBruijnPair (Term a) (TermIn a) where
decode = undefined
encode = undefined
Please note since the original name of Term
is stored in TermIn
, there's a one-to-one mapping of Term
to TermIn
, and back.
Restate the Question
Now you can see how much boiler plate is involved, which I'd like to get rid of using some elegant abstraction where I define Term
and some function over types outputs TermIn
. Just to provide even more context, I'm creating many pairs of such external and de Bruijn representations for different lambda-calculus formulations, and the one-to-one relation holds for all of them.
The first step is to separate out the parts that are specific to each representation (Var
, Lam
, Let
) from the rest of the definitions.
data AbstractTerm ∷ Type → ★ where
App ∷ AbstractTerm (a :-> b) → AbstractTerm a → AbstractTerm b
Tup ∷ AbstractTerm a → AbstractTerm b → AbstractTerm (a :*: b)
Lft ∷ AbstractTerm a → AbstractTerm (a :+: b)
Rgt ∷ AbstractTerm b → AbstractTerm (a :+: b)
Tru, Fls ∷ AbstractTerm Boolean
Uni ∷ AbstractTerm Unit
data Term ∷ Type → ★ where
Var ∷ Id → Term a
Lam ∷ Id → Type → Term b → Term (a :-> b)
Let ∷ Id → Term a → Term b → Term b
data TermIn ∷ Type → ★ where
VarI ∷ Idx → Info → TermIn a
LamI ∷ Type → TermIn b → Info → TermIn (a :-> b)
LetI ∷ TermIn a → TermIn b → Info → TermIn b
Then, you need to combine the ‘generic’ part with the concrete representation you want. There's a well-known trick to building inductive definitions into smaller chunks: you refactor the inductive call out of the data-type, making the types of sub-elements a parameter to the type (in this case, a function over your Type kind, since you need to keep track of the object-language type).
data AbstractTerm (t ∷ Type → ★) ∷ Type → ★ where
App ∷ t (a :-> b) → t a → AbstractTerm t b
Tup ∷ t a → t b → AbstractTerm t (a :*: b)
Lft ∷ t a → AbstractTerm t (a :+: b)
Rgt ∷ t b → AbstractTerm t (a :+: b)
Tru, Fls ∷ AbstractTerm t Boolean
Uni ∷ AbstractTerm t Unit
data Term (t ∷ Type → ★) ∷ Type → ★ where
Var ∷ Id → Term t a
Lam ∷ Id → Type → t b → Term t (a :-> b)
Let ∷ Id → t a → t b → Term t b
data TermIn (t ∷ Type → ★) ∷ Type → ★ where
VarI ∷ Idx → Info → TermIn t a
LamI ∷ Type → t b → Info → TermIn t (a :-> b)
LetI ∷ t a → t b → Info → TermIn t b
To instantiate this type, you can use an inductive type definition that sums itself with the abstract type to obtain the parameter to give to the abstract type.
newtype ConcreteTermNamed ty
= ConcreteTermNamed (Either (Term ConcreteTermNamed ty)
(AbstractTerm ConcreteTermNamed ty))
newtype ConcreteTermNameless ty
= ConcreteTermNameless (Either (TermIn ConcreteTermNameless ty)
(AbstractTerm ConcreteTermNameless ty))
This introduces a little additional noise to choose whether you want a representation-agnostic or -specific term at each level, plus the Haskell-mandated newtype wrapper, but otherwise leaves your term language as it was.
var ∷ ConcreteTermNamed (Boolean :*: Unit)
var = ConcreteTermNamed
(Right (Tup (ConcreteTermNamed (Left (Var "x")))
(ConcreteTermNamed (Left (Var "y")))))
fun ∷ ConcreteTermNamed (Boolean :-> (Unit :*: Boolean))
fun = ConcreteTermNamed (Left
(Lam "b" Boolean
(ConcreteTermNamed (Right
(Tup (ConcreteTermNamed (Right Uni))
(ConcreteTermNamed (Left (Var "b"))))))))
This trick can be used to sum any number of different inductive types, and is often used to break down a larger language into smaller, more modular sub-languages; for example, it might be good style to split your AbstractTerm up further into application, product, sum, and unit types, and then merge them all together by adding them into the sum type.