haskelliomonadsinfinite-looplazy-evaluation

How do I handle an infinite list of IO objects in Haskell?


I'm writing a program that reads from a list of files. The each file either contains a link to the next file or marks that it's the end of the chain.

Being new to Haskell, it seemed like the idiomatic way to handle this is is a lazy list of possible files to this end, I have

getFirstFile :: String -> DataFile
getNextFile :: Maybe DataFile -> Maybe DataFile

loadFiles :: String -> [Maybe DataFile]
loadFiles = iterate getNextFile . Just . getFirstFile

getFiles :: String -> [DataFile]
getFiles = map fromJust . takeWhile isJust . loadFiles

So far, so good. The only problem is that, since getFirstFile and getNextFile both need to open files, I need their results to be in the IO monad. This gives the modified form of

getFirstFile :: String -> IO DataFile
getNextFile :: Maybe DataFile -> IO (Maybe DataFile)

loadFiles :: String -> [IO Maybe DataFile]
loadFiles = iterate (getNextFile =<<) . Just . getFirstFile

getFiles :: String -> IO [DataFile]
getFiles = liftM (map fromJust . takeWhile isJust) . sequence . loadFiles

The problem with this is that, since iterate returns an infinite list, sequence becomes an infinite loop. I'm not sure how to proceed from here. Is there a lazier form of sequence that won't hit all of the list elements? Should I be rejiggering the map and takeWhile to be operating inside the IO monad for each list element? Or do I need to drop the whole infinite list process and write a recursive function to terminate the list manually?


Solution

  • A step in the right direction

    What puzzles me is getNextFile. Step into a simplified world with me, where we're not dealing with IO yet. The type is Maybe DataFile -> Maybe DataFile. In my opinion, this should simply be DataFile -> Maybe DataFile, and I will operate under the assumption that this adjustment is possible. And that looks like a good candidate for unfoldr. The first thing I am going to do is make my own simplified version of unfoldr, which is less general but simpler to use.

    import Data.List
    
    -- unfoldr :: (b -> Maybe (a,b)) -> b -> [a]
    myUnfoldr :: (a -> Maybe a) -> a -> [a]
    myUnfoldr f v = v : unfoldr (fmap tuplefy . f) v
      where tuplefy x = (x,x)
    

    Now the type f :: a -> Maybe a matches getNextFile :: DataFile -> Maybe DataFile

    getFiles :: String -> [DataFile]
    getFiles = myUnfoldr getNextFile . getFirstFile
    

    Beautiful, right? unfoldr is a lot like iterate, except once it hits Nothing, it terminates the list.

    Now, we have a problem. IO. How can we do the same thing with IO thrown in there? Don't even think about The Function Which Shall Not Be Named. We need a beefed up unfoldr to handle this. Fortunately, the source for unfoldr is available to us.

    unfoldr      :: (b -> Maybe (a, b)) -> b -> [a]
    unfoldr f b  =
      case f b of
       Just (a,new_b) -> a : unfoldr f new_b
       Nothing        -> []
    

    Now what do we need? A healthy dose of IO. liftM2 unfoldr almost gets us the right type, but won't quite cut it this time.

    An actual solution

    unfoldrM :: Monad m => (b -> m (Maybe (a, b))) -> b -> m [a]
    unfoldrM f b = do
      res <- f b
      case res of
        Just (a, b') -> do
          bs <- unfoldrM f b'
          return $ a : bs
        Nothing -> return []
    

    It is a rather straightforward transformation; I wonder if there is some combinator that could accomplish the same.

    Fun fact: we can now define unfoldr f b = runIdentity $ unfoldrM (return . f) b

    Let's again define a simplified myUnfoldrM, we just have to sprinkle in a liftM in there:

    myUnfoldrM :: Monad m => (a -> m (Maybe a)) -> a -> m [a]
    myUnfoldrM f v = (v:) `liftM` unfoldrM (liftM (fmap tuplefy) . f) v
      where tuplefy x = (x,x)
    

    And now we're all set, just like before.

    getFirstFile :: String -> IO DataFile
    getNextFile :: DataFile -> IO (Maybe DataFile)
    
    getFiles :: String -> IO [DataFile]
    getFiles str = do
      firstFile <- getFirstFile str
      myUnfoldrM getNextFile firstFile
    
    -- alternatively, to make it look like before
    getFiles' :: String -> IO [DataFile]
    getFiles' = myUnfoldrM getNextFile <=< getFirstFile
    

    By the way, I typechecked all of these with data DataFile = NoClueWhatGoesHere, and the type signatures for getFirstFile and getNextFile, with their definitions set to undefined.


    [edit] changed myUnfoldr and myUnfoldrM to behave more like iterate, including the initial value in the list of results.

    [edit] Additional insight on unfolds:

    If you have a hard time wrapping your head around unfolds, the Collatz sequence is possibly one of the simplest examples.

    collatz :: Integral a => a -> Maybe a
    collatz 1 = Nothing -- the sequence ends when you hit 1
    collatz n | even n    = Just $ n `div` 2
              | otherwise = Just $ 3 * n + 1
    
    collatzSequence :: Integral a => a -> [a]
    collatzSequence = myUnfoldr collatz
    

    Remember, myUnfoldr is a simplified unfold for the cases where the "next seed" and the "current output value" are the same, as is the case for collatz. This behavior should be easy to see given myUnfoldr's simple definition in terms of unfoldr and tuplefy x = (x,x).

    ghci> collatzSequence 9
    [9,28,14,7,22,11,34,17,52,26,13,40,20,10,5,16,8,4,2,1]
    

    More, mostly unrelated thoughts

    The rest has absolutely nothing to do with the question, but I just couldn't resist musing. We can define myUnfoldr in terms of myUnfoldrM:

    myUnfoldr f v = runIdentity $ myUnfoldrM (return . f) v
    

    Look familiar? We can even abstract this pattern:

    sinkM :: ((a -> Identity b) -> a -> Identity c) -> (a -> b) -> a -> c
    sinkM hof f = runIdentity . hof (return . f)
    
    unfoldr = sinkM unfoldrM
    myUnfoldr = sinkM myUnfoldrM
    

    sinkM should work to "sink" (opposite of "lift") any function of the form

    Monad m => (a -> m b) -> a -> m c.

    since the Monad m in those functions can be unified with the Identity monad constraint of sinkM. However, I don't see anything that sinkM would actually be useful for.