Constraining Growing Trees: Type families and Type classes

This question is about designing abstractions around the Trees That Grow idiom.

Suppose I have a very simple syntax tree:

data Expr χ  
  = Lam  (Lamfam χ) Name (Ty χ) (Expr χ)
  | App  (Appfam χ) (Expr χ)
  | Unit (Ufam χ)

data Ty χ  
  = UnitTy (Utyfam χ)
  | Arrow (Arrfam χ) (Ty χ) (Ty χ)

type family Lamfam χ
type family Appfam χ
type family Ufam χ
type family Utyfam χ
type family Arrfam χ

My question arises in looking for a way to place constraints on this set of type families. The paper provides one solution: constraint kinds:

type ForAllχ (c :: * -> Constraint) χ
  = (c (LamFam χ), c (Appfam χ), c (Ufam χ), c (Utyfam χ), c (Arrfam χ))

  
Forallχ Show χ => Show (Ty χ) where   
  show = ...

Forallχ Show χ => Show (Exp χ)  
  show = ...

However, this only works if I want to place the same constraint on all types.

Suppose, for example, that I'm writing a bidirectional type checker algorithm which is generic over parameter χ.

infer env term = case term of 
    ...
    Lam ex name ty body -> do
      ret_ty <- infer (insert name ty env) body
      pure $ Arr ??? a ret_ty
    ...

We don't have anything to plug in where the ??? is. To solve this, maybe I want there to be a function lamToArr :: (Lamfam χ) -> (Arrfam χ) which allows the type to decide how extension information flows through the type checker.

infer lamtoArr env term = case term of 
    ...
    Lam ex name ty body -> do
      ret_ty <- infer lamtoArr (insert name ty env) body
      pure $ Arr lamtoArr a ret_ty
    ...

Of course, as add more functions this can get awkward, so ideally, I'd like to be able to use typeclasses - something like:

class Checker χ where  
  lamToArr :: (Lamfam χ) -> (Arrfam χ)
  unitToTy :: (Ufam χ) -> (Utyfam χ)

However, this produces a type error:

    • Couldn't match type: Lamfam χ0
                     with: Lamfam χ
      Expected: Lamfam χ -> Arrfam χ
        Actual: Lamfam χ0 -> Arrfam χ0
      NB: ‘Lamfam’ is a non-injective type family
      The type variable ‘χ0’ is ambiguous
    • In the ambiguity check for ‘lamToArr’
      To defer the ambiguity check to use sites, enable AllowAmbiguousTypes
      When checking the class method:
        lamToArr :: forall χ. Checker χ => Lamfam χ -> Arrfam χ
      In the class declaration for ‘Checker’
   |
41 |   lamToArr :: (Lamfam χ) -> (Arrfam χ)
   |   ^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This error suggests that disabling the ambiguity check and/or making my type families injective (not quite sure what that means) could be a solution, but I wanted to check if any of this is a good solution to my problem in the first place, and if so then which of these two approaches is more what I'm looking for?

The general intent would be for the constraint do describe how information flows between types for a particular algorithm/stage, providing only what that particular stage needs. This would enable me to update or extend extensions to the tree without having to update the core algorithms.

Solution

In your case, (Lamfam χ) -> (Arrfam χ) is ambiguous because there is no guarantee that either Lamfam χ or Arrfam χ is enough to tell which χ you want at the call site. For example, a freshly parsed AST without annotations might set these both to some data SrcLoc.

There’s no problem with AllowAmbiguousTypes here. It says that you agree to write type annotations or TypeApplications sometimes at call sites. You could make these type families injective using TypeFamilyDependencies annotations:

type family Lamfam χ = λ | λ -> χ

Implicitly, a type family must be a function, whose input χ uniquely determines its output λ. This just adds a dependency in the other direction, that for any output λ that came from an application of Lamfam χ, it must be enough to tell what the input χ was. This is a pretty strict requirement that negates some of the intended benefits of TTG (avoiding wrapper types). Also there are some expressiveness issues with injectivity checking in current GHC. Nevertheless, it does tend to make inference and error messages a bit more helpful when it works. A common idiom for making a family injective is to keep the input type around as a phantom type parameter. Basically, replace stuff like this:

newtype Name = Name String
newtype Id   = Id   Int

type family Namefam χ
type instance Namefam Parsed  = Name
type instance Namefam Renamed = Id    -- conflict
type instance Namefam Typed   = Id    -- conflict

With stuff like this:

newtype Name  = Name String
newtype Id' χ = Id'  Int
-- or: Const Id χ

type family Namefam χ
type instance Namefam 'Parsed      = Name
type instance Namefam 'Renamed     = Id' 'Renamed
type instance Namefam 'Typechecked = Id' 'Typechecked

However, this is getting a bit astray. Normally with TTG, the typechecker wouldn’t be polymorphic in χ, it would change χ from 'Renamed to 'Typechecked by reannotating the tree. In that case, you have concrete types available, and you don’t need a typeclass.

Your class Checker χ only describes the set of AST states where you have enough information to run the typechecker. So I don’t know if it really pulls its weight as an abstraction, unless you want to be able to repeatedly re-typecheck the program between passes, say, to ensure nothing’s been broken by a rewrite.

That said, there is one noteworthy solution in GHC for the problem of large sets of constraints, which you might find helpful. First, they start with a Pass type to index the result of each pass.

data Pass
  = Parsed
  | Renamed
  | Typechecked

(There are only a few, because they only use TTG for the frontend right now, but if you’re writing a compiler using TTG from the get-go, you can certainly use it for more of the pipeline.)

Next, instead of indexing the AST and extension points with this type directly, like HsExpr 'Parsed, they use a GADT GhcPass (like Sing Pass with singletons).

data GhcPass (c :: Pass) where
  GhcPs :: GhcPass 'Parsed
  GhcRn :: GhcPass 'Renamed
  GhcTc :: GhcPass 'Typechecked

-- Type synonyms as a shorthand for tagging
type GhcPs   = GhcPass 'Parsed      -- Output of parser
type GhcRn   = GhcPass 'Renamed     -- Output of renamer
type GhcTc   = GhcPass 'Typechecked -- Output of typechecker

Finally they add a class IsPass (like SingI or KnownNat) which allows dispatching on the type in a uniform way.

class IsPass p where
  ghcPass :: GhcPass p

instance IsPass 'Parsed where
  ghcPass = GhcPs

instance IsPass 'Renamed where
  ghcPass = GhcRn

instance IsPass 'Typechecked where
  ghcPass = GhcTc

This is used especially for pretty-printing, and sometimes for constructing trees. It means that instead of carrying around a set of all the constraints they might need, like in ForAllχ c χ, they can use IsPass p => HsExpr (GhcPass p). With χ provided by a pattern-match, every c χ becomes available. Notably, you can even use such constraints in an ad-hoc way, without any particular class:

pprIfPs :: forall p. IsPass p => (p ~ 'Parsed => SDoc) -> SDoc
pprIfPs pp = case ghcPass @p of
  GhcPs -> pp
  _ -> empty

If you do want to write a lot of pass-polymorphic code, I find that it’s by far easier and clearer to use a data family or GADT in lieu of TTG. The compiler messages are much more helpful, and there are more tools available to reduce boilerplate, such as StandaloneDeriving.