haskell functional-programming algebraic-data-types church-encoding scott-encoding

Why do we use folds to encode datatypes as functions?

Or to be specific, why do we use foldr to encode lists and iteration to encode numbers?

Sorry for the longwinded introduction, but I don't really know how to name the things I want to ask about so I'll need to give some exposition first. This draws heavily from this C.A.McCann post that just not quite satisfies my curiosity and I'll also be handwaving the issues with rank-n-types and infinite lazy things.

One way to encode datatypes as functions is to create a "pattern matching" function that receives one argument for each case, each argument being a function that receives the values corresponding to that constructor and all arguments returning a same result type.

This all works out as expected for non-recursive types

--encoding data Bool = true | False
type Bool r = r -> r -> r

true :: Bool r
true = \ct cf -> ct

false :: Bool r
false = \ct cf -> cf

--encoding data Either a b = Left a | Right b
type Either a b r = (a -> r) -> (b -> r) -> r

left :: a -> Either a b r
left x = \cl cr -> cl x

right :: b -> Either a b r
right y = \cl cr -> cr y

However, the nice analogy with pattern matching breaks down with recursive types. We might be tempted to do something like

--encoding data Nat = Z | S Nat
type RecNat r = r -> (RecNat -> r) -> r
zero = \cz cs -> cz
succ n = \cz cs -> cs n

-- encoding data List a = Nil | Cons a (List a)
type RecListType a r = r -> (a -> RecListType -> r) -> r
nil = \cnil ccons -> cnil
cons x xs = \cnil ccons -> ccons x xs

but we can't write those recursive type definitions in Haskell! The usual solution is to force the callback of the cons/succ case to be applied to all levels of recursion instead of just the first one (ie, writing a fold/iterator). In this version we use the return type r where the recursive type would be:

--encoding data Nat = Z | S Nat
type Nat r = r -> (r -> r) -> r
zero = \cz cf -> cz
succ n = \cz cf -> cf (n cz cf)

-- encoding data List a = Nil | Cons a (List a)
type recListType a r = r -> (a -> r -> r) -> r
nil = \z f -> z
cons x xs = \z f -> f x (xs z f)

While this version works, it makes defining some functions much harder. For example, writing a "tail" function for lists or a "predecessor" function for numbers is trivial if you can use pattern matching but gets tricky if you need to use the folds instead.

So onto my real questions:

How can we be sure that the encoding using folds is as powerful as the hypothetical "pattern matching encoding"? Is there a way to take an arbitrary function definition via pattern matching and mechanically convert it to one using only folds instead? (If so, this would also help make tricky definitions such as tail or foldl in terms of foldr as less magical)
Why doesn't the Haskell type system allow for the recursive types needed in the "pattern matching" encoding?. Is there a reason for only allowing recursive types in datatypes defined via data? Is pattern matching the only way to consume recursive algebraic datatypes directly? Does it have to do with the type inferencing algorithm?

Solution

The Wikipedia page on Scott encoding has some useful insights. The short version is, what you're referring to is the Church encoding, and your "hypothetical pattern-match encoding" is the Scott encoding. Both are sensible ways of doing things, but the Church encoding requires lighter type machinery to use (in particular, it does not require recursive types).

The proof that the two are equivalent uses the following idea:

churchfold :: (a -> b -> b) -> b -> [a] -> b
churchfold _ z [] = z
churchfold f z (x:xs) = f x (churchfold f z xs)

scottfold :: (a -> [a] -> b) -> b -> [a] -> b
scottfold _ z [] = z
scottfold f _ (x:xs) = f x xs

scottFromChurch :: (a -> [a] -> b) -> b -> [a] -> b
scottFromChurch f z xs = fst (churchfold g (z, []) xs)
 where
  g x ~(_, xs) = (f x xs, x : xs)

The idea is that since churchfold (:) [] is the identity on lists, we can use a Church fold that produces the list argument it is given as well as the result it is supposed to produce. Then in the chain x1 `f` (x2 `f` (... `f` xn) ... ) the outermost f receives a pair (y, x2 : ... : xn : []) (for some y we don't care about), so returns f x1 (x2 : ... : xn : []). Of course, it also has to return x1 : ... : xn : [] so that any more applications of f could also work.

(This is actually a little similar to the proof of the mathematical principle of strong (or complete) induction, from the "weak" or usual principle of induction).

By the way, your Bool r type is a bit too big for real Church booleans – e.g. (+) :: Bool Integer, but (+) isn't really a Church boolean. If you enable RankNTypes then you can use a more precise type: type Bool = forall r. r -> r -> r. Now it is forced to be polymorphic, so genuinely only contains two (ignoring seq and bottom) inhabitants – \t _ -> t and \_ f -> f. Similar ideas apply to your other Church types, too.