haskelltypesrecursion-schemescatamorphismanamorphism

Memoizing a recursion scheme


Is it possible to memoize a recursion scheme? If so, how would you?

For example, the following uses anamophism and catamorphism

newtype Fix f = In (f (Fix f))

deriving instance (Eq (f (Fix f))) => Eq (Fix f)
deriving instance (Ord (f (Fix f))) => Ord (Fix f)
deriving instance (Show (f (Fix f))) => Show (Fix f)

out :: Fix f -> f (Fix f)
out (In f) = f

-- Catamorphism
type Algebra f a = f a -> a

cata :: (Functor f) => Algebra f a -> Fix f -> a                                                                                                                                
cata f = f . fmap (cata f) . out                                                                                                                                                
                                                                                                                                                                                
-- Anamorphism                                                                                                                                                                  
type Coalgebra f a = a -> f a                                                                                                                                                   
                                                                                                                                                                                
ana :: (Functor f) => Coalgebra f a -> a -> Fix f                                                                                                                               
ana f = In . fmap (ana f) . f 

to solve the lattice paths problem:

latticePaths m n = cata countPathsAlgNoMemo (ana buildLattice (m, n))                                                                           
 
-- recursive solution without dynamic programming                                                                                                    
buildLattice :: (Int, Int) -> LeafBTreeF Int (Int, Int)                                                                                              
buildLattice (m, n)                                                                                                                                  
        | m == 0 && n == 0 = LeafBTreeLeafF 1                                                                                                        
        | m < 0 || n < 0 = LeafBTreeLeafF 0                                                                                                          
        | otherwise = LeafBTreeNodeF (m - 1, n) (m, n - 1)
                                                                                                                                                     
countPathsAlgNoMemo :: LeafBTreeF Int Int -> Int                                                                                                
countPathsAlgNoMemo (LeafBTreeLeafF n) = n
countPathsAlgNoMemo (LeafBTreeNodeF a b) = a + b

It is inefficient because subproblems are recomputed instead of stored and reused. I would like to know if there is a way to store (or get the haskell compiler to store) previously computed subproblems.

I've had a look at some resources related to memoizing polymorphic functions (http://blog.sigfpe.com/2009/11/memoizing-polymorphic-functions-with.html, http://conal.net/blog/posts/memoizing-polymorphic-functions-part-two) but haven't been able understand how they might apply here.

NOTE: I'm specifically interested in whether apomorphism/paramorphism and anamorphism/catamorphism can be memoized (or if there is any other solution for storing subproblems using these recursion schemes). I understand that histomorphism and dynamorphism are suited to solve dynamic programming problems but for my purposes I want to limit my focus to apo/para or ana/cata.

My paramorphism and apomorphism:

-- Paramorphism
type RAlgebra f a = f (Fix f, a) -> a                                                                                          
        
para :: (Functor f) => RAlgebra f a -> Fix f -> a
para rAlg = rAlg . fmap fanout . out
        where fanout t = (t, para rAlg t)
                                                                                                                                                                                
-- Apomorphism
type RCoalgebra f a = a -> f (Either (Fix f) a)                                        
                                                                                       
apo :: Functor f => RCoalgebra f a -> a -> Fix f                                       
apo rCoalg = In . fmap fanin . rCoalg                                                                                                                                           
        where fanin = either id (apo rCoalg)


Solution

  • Update: See below on paramorphism/apomorphism compositions.

    Note that memoizing cata f and ana g separately is pointless. The problem is that building a fresh lattice from scratch:

    ana buildLattice (20,20)
    

    is basically no more difficult than reading that structure from a pre-generated copy in memory. It would be like memoizing replicate 1000000000 'x'. It doesn't make any sense.

    Same with cata countPathsAlg. Calculating the number of paths is easy. The hard part is traversing the structure (whether to perform the calculation or to look it up as a key in a memo table). If you want to memoize a catamorphism efficiently, you need to represent large structures by simple keys. But we already have simple keys for our structures -- they're the values we pass to the anamorphism to generate those structures!

    In other words, to make this computation amenable to memoization, you need to eliminate the intermediate data structure by fusing the catamorphism and anamorphism.

    This isn't too tough. Suppose we have:

    h = cata f . ana g
    

    Then, it follows that:

    h
    = cata f . ana g
    = f . fmap (cata f) . out . In . fmap (ana g) . g
    = f . fmap (cata f . ana g) . g
    = f . fmap h . g
    

    Note that this is a refold AKA hylo from the recursion-schemes package:

    hylo f g = h where h = f . fmap h . g
    

    For your example, we have the fused recursion:

    latticePaths :: (Int, Int) -> Int
    latticePaths = countPathsAlg . fmap latticePaths . buildLattice
    

    which is easily memoized with, say, the memoize package:

    import Data.Function.Memoize
    
    latticePaths :: (Int, Int) -> Int
    latticePaths = h where h = memoize (countPathsAlg . fmap h . buildLattice)
    
    main = print $ latticePaths (100,100)
    

    More generally, any hylomorphism can be memoized with:

    hylo_memo f g = h where h = memoize (f . fmap h . g)
    

    Full memoization of a paramorphism/apomorphism combination isn't possible. There are two problems.

    First, when an apo produces a Left, it can produce an arbitrarily complex structure. Even if you could memoize the root of this structure (as it's keyed by an a value), there's no efficient way to memoize identical subproblems within that structure. For example, suppose an apo produces a Left rose tree consisting of 1000 identical branches each of a million nodes. The only way to discover that all 1000 branches are identical and can share the same solution is to traverse the million nodes in each branch to compare them.

    Second, a paramorphism calculation can, in general, depend on the complete structure at the current node, not just some "small" combination of previously calculated paramorphism results from local structure defined by the algebra. For example, consider a paramorphism on a rose tree that sums up the number of times a node's value appears in its subtree, and adds those counts together across all nodes, and suppose it's implemented as follows:

    data Rose a t = Rose a [t]
    refCount :: (Eq a) => RAlgebra (Rose a) Int
    refCount (Rose a bs)
        -- reference counts from subnodes
      = sum (map snd bs)
        -- occurrences of "a" in the branches
      + sum (map (countValues a . fst) bs)
    
    -- count number of "a"s in the tree
    countValues :: (Eq a) => a -> Fix (Rose a) -> Int
    countValues a = ...
    

    Note that once we have the result for a "subproblem", all we have is the total reference count in that tree. If this same tree appears under five different parent nodes, we can reuse this result, but we still need to search the whole tree for the five different parent node values to construct their counts.

    This doesn't mean that memoization is useless. We can memoize the Right parts of the apomorphism, and memoization of the paramorphism results may still be useful even if full structure traversals are needed (e.g., for refCount above, partial memoization reduces the complexity from cubic to quadratic, I think).

    So, if you continue your expansion:

    h
    = para f . apo g
    = f . fmap ((\t -> (t, para f t)) . either id (apo g)) . g
    = f . fmap k . g
      where k = \case Left t -> (t, para f t)
                      Right a -> (apo g, h a)
    

    we can achieve the partial memoization:

    paraApo :: (Ord a, Functor f) => RAlgebra f b -> RCoalgebra f a -> a -> b
    paraApo f g = h
      where h = memo $
              f . fmap k . g
              where k = \case Left t -> (t, para f t)
                              Right a -> (apo g a, h a)
    

    which may be useful. The unmemoized parts will be the para f t when the apo produces a Left and the potential costly operation of the para on its first argument produced by apo g a in the Right branch.

    A catamorphism/apomorphism combination is also partially memoizable. We still have the problem that the apo can produce Lefts with arbitrary, unmemoizable structure, but the Rights of the apo can be fully memoized.

    h
    = cata f . apo g
    = f . fmap (cata f . either id (apo g)) . g
    = f . fmap k . g
      where k = \case Left t  -> cata f t      -- non-memoizable
                      Right a -> h a           -- memoizable