multithreadinghaskellioref

Forcing evaluation across an IORef: rnf, deepSeq, or something else?


I have a long-running process which is forkIO'd, which produces pixel color values:

takesAgesToRun :: [[Color]]

myForkedProcess :: IORef [[Color]] -> IO ()
myForkedProcess ref = do let colors = takesAgesToRun
                         writeIORef ref colors

(where Color just holds three Double values).

As expected, when read on the "other side" of the IORef, the value that was stored is just a thunk, and hence blocks the main process.

I know I need to fully evaluate the [[Color]] value to head normal form, but there seem to be two ways of achieving that, and further, I'm not sure how to incorporate either into my code.

How would I go about this? Do I use rnf, deepSeq, or some other threading strategy? Is one of these the preferred one, and the others deprecated? And how does it fit into my code?

(PS please ignore the fact that storing the image as a list of a list of colors is stupid - this is just a simplified version of the code).


Solution

  • Use deepSeq. It's used just like seq. You would incorporate it like this:

    myForkedProcess :: IORef [[Color]] -> IO ()
    myForkedProcess ref = do let colors = takesAgesToRun
                             deepSeq colors $ writeIORef ref colors
    

    This will force "colors" to be fully evaluated before the "writeIORef" call.

    In order for this to work, you will need an NFData instance for Color. Exactly how to write this depends upon the definition of Color, but here are two examples:

    -- just for reference
    data Color = Color Double Double Double
    
    instance NFData Color where
        rnf (Color r g b) = r `seq` g `seq` b `seq` ()
    
    -- closer to the likely actual implementation for Color
    data Color2 = Color2 !Double !Double !Double
    
    instance NFData  Color2 where
    -- the default implementation is fine
    

    For the Color instance, you need to ensure that all components of the color are fully evaluated[1] whenever Color is. That's what the seqs do. We can use seq instead of deepSeq here because we know that each component is a Double, therefore is fully evaluated by seq. If a component were a more complex data type, then we would need to use deepSeq when writing the NFData instance.

    In Color2 it's a bit simpler. Because of the bang patterns, we know that the components are fully evaluated when Color2 is. This means we can use the default implementation, which evaluates Color2 to weak head normal form, which due to the bang patterns is fully evaluated.

    rnf is mainly useful when used in combination with Control.Parallel.Strategies. Here's the current definition of deepSeq

    deepseq :: NFData a => a -> b -> b
    deepseq a b = rnf a `seq` b
    

    All deepseq does is call rnf and guarantee that its output () is evaluated. This is really the only way to use rnf directly.

    [1] Haskell provides only two general ways to evaluate stuff: pattern matching and seq. Everything else is built upon one or both of these. For the NFData Color instance, Color is first evaluated to WHNF by pattern matching with the Color constructor, then the components are evaluated via seq.

    Of course there is also a third, highly specialized, way to evaluate stuff: i.e. a function main :: IO () will be executed in order to evaluate the ().