haskellnewtypearrow-abstraction

Why `data` cause an infinite loop while `newtype` not


I am learning Arrow following the tutorial programming with arrows. I've typed the following code according to the paper except that the SF is defined by data, not by newtype as in the paper (actually, I made this change by chance, since I typed the code from memory):

import Control.Category
import Control.Arrow
import Prelude hiding (id, (.))

data SF a b = SF { runSF :: [a] -> [b] }  -- this is the change, using data instead of newtype as in the paper 

-- The folowing code is the same as in the paper
instance Category SF where
  id = SF $ \x -> x
  (SF f) . (SF g) = SF $ \x -> f (g x)

instance Arrow SF where
  arr f = SF $ map f
  first (SF f) = SF $ unzip >>> first f >>> uncurry zip

instance ArrowChoice SF where
  left (SF f) = SF $ \xs -> combine xs (f [y | Left y <- xs])
    where
      combine (Left _ : ys) (z:zs) = Left z : combine ys zs
      combine (Right y : ys) zs = Right y : combine ys zs
      combine [] _ = []

delay :: a -> SF a a
delay x = SF $ init . (x:)

mapA :: ArrowChoice a => a b c -> a [b] [c]
mapA f = arr listcase >>>
         arr (const []) ||| (f *** mapA f >>> arr (uncurry (:)))

listcase :: [a] -> Either () (a, [a])
listcase [] = Left ()
listcase (x:xs) = Right (x, xs)

When I load the file in ghci and execute runSF (mapA (delay 0)) [[1,2,3],[4,5,6]], it triggers an infinit loop and runs out of memory finally. If I change data back to newtype, everything is OK. The same problem happens in ghc 8.0.2, 8.2.2 and 8.6.3.

The same problem also exists even I compile the code into an executable.

I have thought the difference between data and newtype, when defining a data structure with only one field, is the runtime cost. But this problem seems to imply more difference between them. Or there may be something that I haven't noticed about the Arrow type-class.

Can anyone have any ideas? Thanks very much!


Solution

  • Let's look at this example.

    data A = A [Int]
        deriving (Show)
    
    cons :: Int -> A -> A
    cons x (A xs) = A (x:xs)
    
    ones :: A
    ones = cons 1 ones
    

    We would expect that ones should be A [1,1,1,1...], because all we have done is wrap a list in a data constructor. But we would be wrong. Recall that pattern matches are strict for data constructors. That is, cons 1 undefined = undefined rather than A (1 : undefined). So when we try to evaluate ones, cons pattern matches on its second argument, which causes us to evaluate ones... we have a problem.

    newtypes don't do this. At runtime newtype constructors are invisible, so it's as if we had written the equivalent program on plain lists

    cons :: Int -> [Int] -> [Int]
    cons x ys = x:ys
    
    ones = cons 1 ones
    

    which is perfectly productive, since when we try to evaluate ones, there is a : constructor between us and the next evaluation of ones.

    You can get back the newtype semantics by making your data constructor pattern matches lazy:

    cons x ~(A xs) = A (x:xs)
    

    This is the problem with your code (I have run into this exact problem doing this exact thing). There are a few reasons data pattern matches are strict by default; the most compelling one I see is that pattern matching would otherwise be impossible if the type had more than one constructor. There is also a small runtime overhead to lazy pattern matching in order to fix some subtle GC leaks; details linked in the comments.