In Scrap your boilerplate reloaded, the authors describe a new presentation of Scrap Your Boilerplate, which is supposed to be equivalent to the original.
However, one difference is that they assume a finite, closed set of "base" types, encoded with a GADT
data Type :: * -> * where
Int :: Type Int
List :: Type a -> Type [a]
...
In the original SYB, type-safe cast is used, implemented using the Typeable
class.
My questions are:
[I am one of the authors of the "SYB Reloaded" paper.]
TL;DR We really just used it because it seemed more beautiful to us. The class-based Typeable
approach is more practical. The Spine
view can be combined with the Typeable
class and does not depend on the Type
GADT.
The paper states this in its conclusions:
Our implementation handles the two central ingredients of generic programming differently from the original SYB paper: we use overloaded functions with explicit type arguments instead of overloaded functions based on a type-safe cast 1 or a class-based extensible scheme [20]; and we use the explicit spine view rather than a combinator-based approach. Both changes are independent of each other, and have been made with clarity in mind: we think that the structure of the SYB approach is more visible in our setting, and that the relations to PolyP and Generic Haskell become clearer. We have revealed that while the spine view is limited in the class of generic functions that can be written, it is applicable to a very large class of data types, including GADTs.
Our approach cannot be used easily as a library, because the encoding of overloaded functions using explicit type arguments requires the extensibility of the Type data type and of functions such as toSpine. One can, however, incorporate Spine into the SYB library while still using the techniques of the SYB papers to encode overloaded functions.
So, the choice of using a GADT for type representation is one we made mainly for clarity. As Don states in his answer, there are some obvious advantages in this representation, namely that it maintains static information about what type a type representation is for, and that it allows us to implement cast without any further magic, and in particular without the use of unsafeCoerce
. Type-indexed functions can also be implemented directly by using pattern matching on the type, and without falling back to various combinators such as mkQ
or extQ
.
Fact is that I (and I think the co-authors) simply were not very fond of the Typeable
class. (In fact, I'm still not, although it is finally becoming a bit more disciplined now in that GHC adds auto-deriving for Typeable
, makes it kind-polymorphic, and will ultimately remove the possibility to define your own instances.) In addition, Typeable
wasn't quite as established and widely known as it is perhaps now, so it seemed appealing to "explain" it by using the GADT encoding. And furthermore, this was the time when we were also thinking about adding open datatypes to Haskell, thereby alleviating the restriction that the GADT is closed.
So, to summarize: If you actually need dynamic type information only for a closed universe, I'd always go for the GADT, because you can use pattern matching to define type-indexed functions, and you do not have to rely on unsafeCoerce
nor advanced compiler magic. If the universe is open, however, which is quite common, certainly for the generic programming setting, then the GADT approach might be instructive, but isn't practical, and using Typeable
is the way to go.
However, as we also state in the conclusions of the paper, the choice of Type
over Typeable
isn't a prerequisite for the other choice we're making, namely to use the Spine
view, which I think is more important and really the core of the paper.
The paper itself shows (in Section 8) a variation inspired by the "Scrap your Boilerplate with Class" paper, which uses a Spine
view with a class constraint instead. But we can also do a more direct development, which I show in the following. For this, we'll use Typeable
from Data.Typeable
, but define our own Data
class which, for simplicity, just contains the toSpine
method:
class Typeable a => Data a where
toSpine :: a -> Spine a
The Spine
datatype now uses the Data
constraint:
data Spine :: * -> * where
Constr :: a -> Spine a
(:<>:) :: (Data a) => Spine (a -> b) -> a -> Spine b
The function fromSpine
is as trivial as with the other representation:
fromSpine :: Spine a -> a
fromSpine (Constr x) = x
fromSpine (c :<>: x) = fromSpine c x
Instances for Data
are trivial for flat types such as Int
:
instance Data Int where
toSpine = Constr
And they're still entirely straightforward for structured types such as binary trees:
data Tree a = Empty | Node (Tree a) a (Tree a)
instance Data a => Data (Tree a) where
toSpine Empty = Constr Empty
toSpine (Node l x r) = Constr Node :<>: l :<>: x :<>: r
The paper then goes on and defines various generic functions, such as mapQ
. These definitions hardly change. We only get class constraints for Data a =>
where the paper has function arguments of Type a ->
:
mapQ :: Query r -> Query [r]
mapQ q = mapQ' q . toSpine
mapQ' :: Query r -> (forall a. Spine a -> [r])
mapQ' q (Constr c) = []
mapQ' q (f :<>: x) = mapQ' q f ++ [q x]
Higher-level functions such as everything
also just lose their explicit type arguments (and then actually look exactly the same as in original SYB):
everything :: (r -> r -> r) -> Query r -> Query r
everything op q x = foldl op (q x) (mapQ (everything op q) x)
As I said above, if we now want to define a generic sum function summing up all Int
occurrences, we cannot pattern match anymore, but have to fall back to mkQ
, but mkQ
is defined purely in terms of Typeable
and completely independent of Spine
:
mkQ :: (Typeable a, Typeable b) => r -> (b -> r) -> a -> r
(r `mkQ` br) a = maybe r br (cast a)
And then (again exactly as in original SYB):
sum :: Query Int
sum = everything (+) sumQ
sumQ :: Query Int
sumQ = mkQ 0 id
For some of the stuff later in the paper (e.g., adding constructor information), a bit more work is needed, but it can all be done. So using Spine
really does not depend on using Type
at all.