I'm learning how to use Arrows in Haskell by implementing a simple interpreter for the comm language.
I have an eval function which evaluates an expression into a value, but eval loops indefinitely when feeding in any expression.
-- Interp.hs
eval :: A Expr Val
eval = proc e -> case e of
Lit x -> returnA -< Num x
Var s -> do
lookup -< s
Add e1 e2 -> do
v1 <- eval -< e1
v2 <- eval -< e2
case (v1, v2) of
(Num x, Num y) -> returnA -< Num (x + y)
Executing this in GHCI results in an infinite loop
*Interp> unpack eval M.empty (Lit 1)
Commenting out the eval's in the case of the Add expression does result in the expression giving a result
e.g.
-- Interp.hs
eval :: A Expr Val
eval = proc e -> case e of
Lit x -> returnA -< Num x
Var s -> do
lookup -< s
Add e1 e2 -> do
returnA -< Num 1
-- v1 <- eval -< e1
-- v2 <- eval -< e2
-- case (v1, v2) of
-- (Num x, Num y) -> returnA -< Num (x + y)
*Interp> unpack eval M.empty (Lit 1)
(Right (Num 1),fromList [])
Here's the code in question
The arrow used is a sort of state function that keeps passing context after failure.
{-# LANGUAGE Arrows #-}
{-# OPTIONS_GHC -Wall #-}
module Interp where
import Prelude hiding (lookup, fail)
import qualified Data.Map as M
import Control.Arrow
import Control.Category
data Expr
= Lit Int
| Var String
| Add Expr Expr
deriving (Show, Eq)
data Val
= Num Int
deriving (Show, Eq)
type Env = M.Map String Val
data A b c = A { unpack :: (Env -> b -> (Either String c, Env)) }
instance Category A where
id = A (\env b -> (Right b, env))
A g . A f = A $ \env b -> case f env b of
(Left err, env') -> (Left err, env')
(Right c, env') -> g env' c
instance Arrow A where
arr f = A $ \env b -> (Right (f b), env)
first (A f) = A $ \env (b, d) -> case f env b of
(Left err, env') -> (Left err, env')
(Right c, env') -> (Right (c, d), env')
instance ArrowChoice A where
left (A f) = A $ \env e -> case e of
Left b -> case f env b of
(Left err, env') -> (Left err, env')
(Right c, env') -> (Right (Left c), env')
Right d -> (Right (Right d), env)
lookup :: A String Val
lookup = A $ \env k -> case M.lookup k env of
Nothing -> (Left "Variable not bound", env)
Just v -> (Right v, env)
eval :: A Expr Val
eval = proc e -> case e of
Lit x -> returnA -< Num x
Var s -> do
lookup -< s
Add e1 e2 -> do
v1 <- eval -< e1
v2 <- eval -< e2
case (v1, v2) of
(Num x, Num y) -> returnA -< Num (x + y)
There are two ways you can fix the non-termination:
A
from a data
declaration to a newtype
declaration.Change the pattern matching in the definitions of first
and left
to be lazy, i.e.:
first ~(A f) = A $ ...same as before...
left ~(A f) = A $ ...same as before...
Both of these fixes address the same root issue. From another StackOverflow answer:
[With
data
declarations], when pattern matching against value constructors, [thunks] WILL be evaluated at least to weak head normal form (WHNF). [...] [Withnewtype
declarations,] when pattern matching against value constructors, [thunks] WILL NOT be evaluated at all.
To continue, I will explain a major difference between newtype
and data
declarations, and then I will explain how it applies to your case.
newtype
and data
Most of the following is paraphrased and adapted from the HaskellWiki article on the same topic.
newtype
is intended to introduce a type that is exactly isomorphic to an existing type. Given a declaration newtype T1 = T1 { unpack :: Int }
, we would like unpack . T1
and id
to be equal at the type Int -> Int
, and likewise for T1 . unpack
and id
to be equal at the type T1 -> T1
.
But why does this not already hold for data T2 = T2 { unpack :: Int }
; that is, why is T2 . unpack
not the same as id
? The reason is non-termination. T2 (unpack _|_)
evaluates to T2 _|_
, but id _|_
evaluates to _|_
(where I'm using _|_
to stand for the non-terminating computation "bottom", as is customary). We can distiguish _|_
and T2 _|_
with the following expression:
\x -> case x of T2 _ -> ()
Applying this lambda to _|_
yields _|_
, but applying this lambda to T2 _|_
yields ()
. That is, because the constructor T2
can possibly hold a non-terminating value, the language makes it possible to distinguish _|_
and T2 _|_
.
The newtype
declaration gives us a perhaps surprising property: T1 (unpack _|_)
evaluates to _|_
, but case _|_ of T1 _ -> ()
evaluates to ()
! That is, because T1 _|_
is not intended to be distinguishable from _|_
, the language does not force evaluation of values of type T1
when pattern matching against the T1
value constructor. And, as we'll see shortly, this property is important with recursive definitions.
There is a way to recover a newtype-like behavior for data
declarations: use lazy pattern matching. Given a function like:
f x = case x of ~(T2 y) -> \() -> y
you will find that both f (T2 _|_)
and f _|_
evaluate to a value (i.e., a function of type Unit -> Int
that would diverge as soon as it is applied to an argument). This is because the lazy pattern match is identical to the function:
f = \x -> case x of t -> \() -> unpack t
and so the evaluation of the thunk passed as x
is "deferred" in the sense that it is only evaluated when unpack t
is evaluated, in the body of the lambda \() -> unpack t
.
I will now desugar the arrow notation in your example and then identify the source of the non-termination.
To desugar the arrow notation, I used the arrowp
program. This is what it yields on your program:
$ arrowp-ext interp.hs
eval :: A Expr Val
eval
= (arr
(\ e ->
case e of
Lit x -> (Left) (x)
Var s -> (Right) ((Left) (s))
Add e1 e2 -> (Right) ((Right) ((e1, e2))))
>>>
(((arr (\ x -> Num x)) >>> (returnA)) |||
(((arr (\ s -> s)) >>> (lookup)) |||
(((first ((arr (\ e1 -> e1)) >>> (eval))) >>>
(arr (\ (v1, e2) -> (e2, v1))))
>>>
(first ((arr (\ e2 -> e2)) >>> (eval))) >>>
(arr
(\ (v2, v1) ->
case
case (v1, v2) of
(Num x, Num y) -> (x, y)
of
(x, y) -> Num (x + y)))
>>> (returnA)))))
You identified that eval
doesn't terminate when applied to an argument, but the problem is more serious than that: eval
diverges, period. This can be observed in ghci:
*Interp> eval `seq` 0
The proximal cause of the non-termination is the expression first ((arr (\ e1 -> e1)) >>> (eval))
, which is the same as first eval
. The first parameter in the definition of first
involves a strict pattern match against a value constructor (first (A f)
), and A
originates from a data declaration, and so, re-quoting @wonder.mice:
[With data declarations], when pattern matching against value constructors, [thunks] WILL be evaluated at least to weak head normal form (WHNF).
When the expression first eval
is encountered in the definition of eval
, the thunk for eval
has not yet been evaluated to WHNF, and so when first eval
attempts to evaluate its argument to WHNF, it will not terminate.
One way to fix this is to change A
to a newtype
declaration. ("1. Change A
from a data
declaration to a newtype
declaration.") Then,
...when pattern matching against value constructors, [thunks] WILL NOT be evaluated at all.
The other way to fix this is to make first
use a lazy pattern match instead of a strict one ("2. Change the pattern matching in the definitions of first
and left
to be lazy"). This will have the net effect of making the pattern match behave the same as for a newtype declaration, where "thunks WILL NOT be evaluated at all".
A similar explanation can be used for why |||
is not behaving in a lazy manner for your example, even though it does for e.g. the Arrow instance for (->)
. Changing left
to use a lazy pattern match will take care of this issue, or just using a newtype
declaration will be sufficient.