haskellrecursioninfinite-loopinterpretationarrow-abstraction

Why does recursive binding of arrow function in Haskell loop infinitely?


I'm learning how to use Arrows in Haskell by implementing a simple interpreter for the comm language.

I have an eval function which evaluates an expression into a value, but eval loops indefinitely when feeding in any expression.

-- Interp.hs

eval :: A Expr Val
eval = proc e -> case e of
    Lit x -> returnA -< Num x
    Var s -> do
        lookup -< s
    Add e1 e2 -> do
        v1 <- eval -< e1
        v2 <- eval -< e2
        case (v1, v2) of
            (Num x, Num y) -> returnA -< Num (x + y)

Executing this in GHCI results in an infinite loop

*Interp> unpack eval M.empty (Lit 1)

Commenting out the eval's in the case of the Add expression does result in the expression giving a result

e.g.

-- Interp.hs

eval :: A Expr Val
eval = proc e -> case e of
    Lit x -> returnA -< Num x
    Var s -> do
        lookup -< s
    Add e1 e2 -> do
        returnA -< Num 1
--        v1 <- eval -< e1
--        v2 <- eval -< e2
--        case (v1, v2) of
--            (Num x, Num y) -> returnA -< Num (x + y)
*Interp> unpack eval M.empty (Lit 1)
(Right (Num 1),fromList [])

Here's the code in question
The arrow used is a sort of state function that keeps passing context after failure.

{-# LANGUAGE Arrows #-}
{-# OPTIONS_GHC -Wall #-}
module Interp where

import Prelude hiding (lookup, fail)

import qualified Data.Map as M
import Control.Arrow
import Control.Category

data Expr
    = Lit Int
    | Var String
    | Add Expr Expr
    deriving (Show, Eq)

data Val
    = Num Int
    deriving (Show, Eq)

type Env = M.Map String Val

data A b c = A { unpack :: (Env -> b -> (Either String c, Env)) }

instance Category A where
    id = A (\env b -> (Right b, env))
    A g . A f = A $ \env b -> case f env b of
        (Left err, env') -> (Left err, env')
        (Right c, env') -> g env' c

instance Arrow A where
    arr f = A $ \env b -> (Right (f b), env)
    first (A f) = A $ \env (b, d) -> case f env b of
        (Left err, env') -> (Left err, env')
        (Right c, env') -> (Right (c, d), env')

instance ArrowChoice A where
    left (A f) = A $ \env e -> case e of
        Left b -> case f env b of
            (Left err, env') -> (Left err, env')
            (Right c, env') -> (Right (Left c), env')
        Right d -> (Right (Right d), env)

lookup :: A String Val
lookup = A $ \env k -> case M.lookup k env of
    Nothing -> (Left "Variable not bound", env)
    Just v -> (Right v, env)

eval :: A Expr Val
eval = proc e -> case e of
    Lit x -> returnA -< Num x
    Var s -> do
        lookup -< s
    Add e1 e2 -> do
        v1 <- eval -< e1
        v2 <- eval -< e2
        case (v1, v2) of
            (Num x, Num y) -> returnA -< Num (x + y)


Solution

  • There are two ways you can fix the non-termination:

    1. Change A from a data declaration to a newtype declaration.
    2. Change the pattern matching in the definitions of first and left to be lazy, i.e.:

      first ~(A f) = A $ ...same as before...
      left ~(A f) = A $ ...same as before...
      

    Both of these fixes address the same root issue. From another StackOverflow answer:

    [With data declarations], when pattern matching against value constructors, [thunks] WILL be evaluated at least to weak head normal form (WHNF). [...] [With newtype declarations,] when pattern matching against value constructors, [thunks] WILL NOT be evaluated at all.

    To continue, I will explain a major difference between newtype and data declarations, and then I will explain how it applies to your case.

    Strictness properties of newtype and data

    Most of the following is paraphrased and adapted from the HaskellWiki article on the same topic.

    newtype is intended to introduce a type that is exactly isomorphic to an existing type. Given a declaration newtype T1 = T1 { unpack :: Int }, we would like unpack . T1 and id to be equal at the type Int -> Int, and likewise for T1 . unpack and id to be equal at the type T1 -> T1.

    But why does this not already hold for data T2 = T2 { unpack :: Int }; that is, why is T2 . unpack not the same as id? The reason is non-termination. T2 (unpack _|_) evaluates to T2 _|_, but id _|_ evaluates to _|_ (where I'm using _|_ to stand for the non-terminating computation "bottom", as is customary). We can distiguish _|_ and T2 _|_ with the following expression:

    \x -> case x of T2 _ -> ()

    Applying this lambda to _|_ yields _|_, but applying this lambda to T2 _|_ yields (). That is, because the constructor T2 can possibly hold a non-terminating value, the language makes it possible to distinguish _|_ and T2 _|_.

    The newtype declaration gives us a perhaps surprising property: T1 (unpack _|_) evaluates to _|_, but case _|_ of T1 _ -> () evaluates to ()! That is, because T1 _|_ is not intended to be distinguishable from _|_, the language does not force evaluation of values of type T1 when pattern matching against the T1 value constructor. And, as we'll see shortly, this property is important with recursive definitions.

    Lazy patterns

    There is a way to recover a newtype-like behavior for data declarations: use lazy pattern matching. Given a function like:

    f x = case x of ~(T2 y) -> \() -> y
    

    you will find that both f (T2 _|_) and f _|_ evaluate to a value (i.e., a function of type Unit -> Int that would diverge as soon as it is applied to an argument). This is because the lazy pattern match is identical to the function:

    f = \x -> case x of t -> \() -> unpack t
    

    and so the evaluation of the thunk passed as x is "deferred" in the sense that it is only evaluated when unpack t is evaluated, in the body of the lambda \() -> unpack t.

    Newtype vs. data declarations for your example

    I will now desugar the arrow notation in your example and then identify the source of the non-termination.

    To desugar the arrow notation, I used the arrowp program. This is what it yields on your program:

    $ arrowp-ext interp.hs
    eval :: A Expr Val
    eval
           = (arr
           (\ e ->
              case e of
                  Lit x -> (Left) (x)
                  Var s -> (Right) ((Left) (s))
                  Add e1 e2 -> (Right) ((Right) ((e1, e2))))
           >>>
           (((arr (\ x -> Num x)) >>> (returnA)) |||
              (((arr (\ s -> s)) >>> (lookup)) |||
                 (((first ((arr (\ e1 -> e1)) >>> (eval))) >>>
                     (arr (\ (v1, e2) -> (e2, v1))))
                    >>>
                    (first ((arr (\ e2 -> e2)) >>> (eval))) >>>
                      (arr
                         (\ (v2, v1) ->
                            case
                              case (v1, v2) of
                                  (Num x, Num y) -> (x, y)
                              of
                                (x, y) -> Num (x + y)))
                        >>> (returnA)))))
    

    You identified that eval doesn't terminate when applied to an argument, but the problem is more serious than that: eval diverges, period. This can be observed in ghci:

    *Interp> eval `seq` 0
    

    The proximal cause of the non-termination is the expression first ((arr (\ e1 -> e1)) >>> (eval)), which is the same as first eval. The first parameter in the definition of first involves a strict pattern match against a value constructor (first (A f)), and A originates from a data declaration, and so, re-quoting @wonder.mice:

    [With data declarations], when pattern matching against value constructors, [thunks] WILL be evaluated at least to weak head normal form (WHNF).

    When the expression first eval is encountered in the definition of eval, the thunk for eval has not yet been evaluated to WHNF, and so when first eval attempts to evaluate its argument to WHNF, it will not terminate.

    One way to fix this is to change A to a newtype declaration. ("1. Change A from a data declaration to a newtype declaration.") Then,

    ...when pattern matching against value constructors, [thunks] WILL NOT be evaluated at all.

    The other way to fix this is to make first use a lazy pattern match instead of a strict one ("2. Change the pattern matching in the definitions of first and left to be lazy"). This will have the net effect of making the pattern match behave the same as for a newtype declaration, where "thunks WILL NOT be evaluated at all".

    A similar explanation can be used for why ||| is not behaving in a lazy manner for your example, even though it does for e.g. the Arrow instance for (->). Changing left to use a lazy pattern match will take care of this issue, or just using a newtype declaration will be sufficient.