haskellconcurrencyfunctional-programmingimmutability

Clarification on the importance of the immutability of the state inside an MVar in the context of concurrency


In Parallel and Concurrent Programming in Haskell by Simon Marlow, at pages 133 and 134 the following code is shown:

type Name        = String
type PhoneNumber = String
type PhoneBook   = Map Name PhoneNumber

newtype PhoneBookState = PhoneBookState (MVar PhoneBook)

lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
lookup (PhoneBookState m) name = do
  book <- takeMVar m
  putMVar m book
  return (Map.lookup name book)

and next page a few observations are made. The final part of the second paragraph puzzles me a bit though. Here's the first half of the paragraph:

Using immutable data structures in a mutable wrapper has further benefits. Note that in the lookup operation, we simply grabbed the current value of the state and then the complex Map.lookup operation takes place outside of the takeMVar/putMVar sequence. This is good for concurrency, because it means the lock is held only for a very short time.

This is fairly clear to me, in the sense that I understand that takeMVar puts the lock, and putMVar immediately releases it, so that following concurrent accesses to the PhoneBookState are not hindered any further. At the same time, I also see that further mutations of the MVar are possible, so by the time lookup returns with a Just smth, that (name, smth) pair might have been removed from the map (or if it returned Nothing, some (name, smth) might have been added to the map).

However, I don't get what the following part of the paragraph tries to convey:

This is possible only because the value of the state is immutable. If the data structure were mutable, we would have to hold the lock while operating on it.⁵


⁵. The other option is to use a lock-free algorithm, which is enormously complex and difficult to get right.

What does that mean? Is the author referring to the case that Map exposed some API to mutated? That's not even possible, unless we are in IO, right? Or is he referring to mutations performed on m's content (by other threads after putMVar has released the lock) somehow affecting book's conent?


Solution

  • Here's a full example that may help illustrate the problem. I'm sure you'll understand if I don't want to write an entire mutable map implementation to answer this question, but consider the following simplified mutable map that is only capable of storing one key-value pair. It supports "new", "lookup", and "replace" operations. (I named it "replace" instead of "insert" because the map can only hold one key-value pair, so the previous entry gets lost when a new one is inserted.)

    data MMap k v = MMap (IORef k) (IORef v)
    
    mmapNew :: k -> v -> IO (MMap k v)
    mmapNew k v = MMap <$> newIORef k <*> newIORef v
    
    mmapReplace :: k -> v -> MMap k v -> IO ()
    mmapReplace k v (MMap k_ v_) = do
      writeIORef k_ k
      writeIORef v_ v
    
    mmapLookup :: (Eq k) => k -> MMap k v -> IO (Maybe v)
    mmapLookup k (MMap k'_ v_) = do
      k' <- readIORef k'_
      if k == k'
        then Just <$> readIORef v_
        else pure Nothing
    

    Now, suppose we want to write a thread-safe, concurrent interface to a phone book using this mutable data structure. The revised code for lookup is more or less the same as the original:

    type Name = String
    type PhoneNumber = String
    type PhoneBook = MMap Name PhoneNumber
    newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
    
    lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
    lookup (PhoneBookState m) name = do
      book <- takeMVar m
      putMVar m book
      mmapLookup name book   -- no `return`, because this is an IO operation
    

    where we perform the actual mmapLookup after unlocking the state.

    The corresponding API code for new and replace would presumably be something like this:

    new :: Name -> PhoneNumber -> IO PhoneBookState
    new name number = do
      book <- mmapNew name number
      PhoneBookState <$> newMVar book
    
    replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
    replace (PhoneBookState m) name number = do
      book <- takeMVar m
      mmapReplace name number book
      putMVar m book
    

    Note that, unlike with lookup, we've taken a conservative approach with replace, holding the lock throughout the replacement operation.

    However, this implementation has a race condition for the lookup/replace pair. To see why, let's add a delay to mmapLookup:

    mmapLookup :: (Eq k) => k -> MMap k v -> IO (Maybe v)
    mmapLookup k (MMap k'_ v_) = do
      k' <- readIORef k'_
      threadDelay 2000000
      if k == k'
        then Just <$> readIORef v_
        else pure Nothing
    

    and consider the following main function, which creates a phone book for "Alice"'s phone number, and then launches two threads -- one to look up "Alice" and another to replace the entry with "Bob"'s number:

    main = do
      pbs <- new "Alice" "555-1234"
      forkIO $ do
        aliceno <- lookup pbs "Alice"
        putStrLn $ "Alice's number is " ++ show aliceno
      threadDelay 1000000
      forkIO $ replace pbs "Bob" "555-9876"
      threadDelay 5000000
      putStrLn "Done!"
    

    When this program runs, it incorrectly retrieves Bob's number in the lookup for Alice:

    Alice's number is Just "555-9876"
    Done!
    

    Obviously, the problem is that the replacement operation takes place between the comparison of the keys and the retrieval of the value.

    If you modify lookup to release the lock only after the lookup completes:

    lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
    lookup (PhoneBookState m) name = do
      book <- takeMVar m
      number <- mmapLookup name book
      putMVar m book
      return number
    

    then there will be no race condition. No matter how you interleave the operations (and no matter how you carefully place delays), you will either retrieve Alice's number of Just "555-1234" from the original phone book or fail to retrieve Alice's number and return Nothing from the revised phone book (where it's been replaced by Bob's number).

    So, for a mutable MMap, we have to hold the lock around the mmapLookup operation to safely operate on the data structure without allowing any other threads to modify the data structure out from under us while we are performing the lookup.

    In contrast, if we were to use an analogous immutable data structure:

    data IMap k v = IMap k v
    
    imapNew :: k -> v -> IMap k v
    imapNew k v = IMap k v
    
    imapReplace :: k -> v -> IMap k v -> IMap k v
    imapReplace k v _ = IMap k v
    
    imapLookup :: (Eq k) => k -> IMap k v -> Maybe v
    imapLookup k (IMap k' v) | k == k' = Just v
                             | otherwise = Nothing
    
    
    type Name = String
    type PhoneNumber = String
    type PhoneBook = IMap Name PhoneNumber
    newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
    

    then the corresponding implementations of new, lookup, and replace would be:

    new :: Name -> PhoneNumber -> IO PhoneBookState
    new name number = PhoneBookState <$> newMVar (imapNew name number)
    
    lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
    lookup (PhoneBookState m) name = do
      book <- takeMVar m
      putMVar m book
      return $ imapLookup name book
    
    replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
    replace (PhoneBookState m) name number = do
      book <- takeMVar m
      let book' = imapReplace name number book
      putMVar m book'
    

    Critically, when replace takes book out of the MVar, which might very well be the same book concurrently being accessed by the imapLookup call in lookup, it does not mutate book itself but rather creates a new book' via the imapReplace call without affecting the original book. This is the essence of immutable data structures, and it's why the imapLookup operation can safety continue to run on the unmodified book even as replace is creating a new book' and writing it back into the vacant MVar.

    Even if we went out of our way to rewrite imapLookup as an IO operation with ample opportunity for another thread to run between the key and value access:

    imapLookup :: (Eq k) => k -> IMap k v -> IO (Maybe v)
    imapLookup k m = do
      k' <- evaluate $ (\(IMap k v) -> k) m    -- force retrieval of the key
      threadDelay 1000000                      -- allow another thread to run
      v' <- evaluate $ (\(IMap k v) -> v) m    -- fetch the value
      pure $ if k == k' then Just v' else Nothing
    

    we still wouldn't trigger a race condition, because the book :: IMap k v being accessed by imapLookup name book is immutable, regardless of how it is being accessed in other threads.

    Here's the full example illustrating the race condition with a mutable MMap structure:

    import Prelude hiding (lookup)
    import Data.IORef
    import Control.Concurrent
    
    data MMap k v = MMap (IORef k) (IORef v)
    
    mmapNew :: k -> v -> IO (MMap k v)
    mmapNew k v = MMap <$> newIORef k <*> newIORef v
    
    mmapReplace :: k -> v -> MMap k v -> IO ()
    mmapReplace k v (MMap k_ v_) = do
      writeIORef k_ k
      writeIORef v_ v
    
    mmapLookup :: (Eq k) => k -> MMap k v -> IO (Maybe v)
    mmapLookup k (MMap k'_ v_) = do
      k' <- readIORef k'_
      threadDelay 2000000  -- allow another thread to do mischief
      if k == k'
        then Just <$> readIORef v_
        else pure Nothing
    
    type Name = String
    type PhoneNumber = String
    type PhoneBook = MMap Name PhoneNumber
    newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
    
    new :: Name -> PhoneNumber -> IO PhoneBookState
    new name number = do
      book <- mmapNew name number
      PhoneBookState <$> newMVar book
    
    lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
    lookup (PhoneBookState m) name = do
      book <- takeMVar m
      putMVar m book
      mmapLookup name book
    
    replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
    replace (PhoneBookState m) name number = do
      book <- takeMVar m
      mmapReplace name number book
      putMVar m book
    
    main = do
      pbs <- new "Alice" "555-1234"
      forkIO $ do
        aliceno <- lookup pbs "Alice"
        putStrLn $ "Alice's number is " ++ show aliceno
      threadDelay 1000000
      forkIO $ replace pbs "Bob" "555-9876"
      threadDelay 2000000  -- wait for children to finish
      putStrLn "Done!"
    

    and the immutable version without the race condition:

    module ImmutableMap where
    
    import qualified Data.Map as Map
    import Prelude hiding (lookup)
    import Control.Concurrent
    import Control.Exception
    
    data IMap k v = IMap k v
    
    imapNew :: k -> v -> IMap k v
    imapNew k v = IMap k v
    
    imapReplace :: k -> v -> IMap k v -> IMap k v
    imapReplace k v _ = IMap k v
    
    imapLookup :: (Eq k) => k -> IMap k v -> IO (Maybe v)
    imapLookup k m = do
      k' <- evaluate $ (\(IMap k v) -> k) m    -- force retrieval of the key
      threadDelay 1000000                      -- allow another thread to run
      v' <- evaluate $ (\(IMap k v) -> v) m    -- fetch the value
      pure $ if k == k' then Just v' else Nothing
    
    type Name = String
    type PhoneNumber = String
    type PhoneBook = IMap Name PhoneNumber
    newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
    
    new :: Name -> PhoneNumber -> IO PhoneBookState
    new name number = PhoneBookState <$> newMVar (imapNew name number)
    
    lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
    lookup (PhoneBookState m) name = do
      book <- takeMVar m
      putMVar m book
      imapLookup name book
    
    replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
    replace (PhoneBookState m) name number = do
      book <- takeMVar m
      let book' = imapReplace name number book
      putMVar m book'
    
    main = do
      pbs <- new "Alice" "555-1234"
      forkIO $ do
        aliceno <- lookup pbs "Alice"
        putStrLn $ "Alice's number is " ++ show aliceno
      threadDelay 1000000
      forkIO $ replace pbs "Bob" "555-9876"
      threadDelay 2000000  -- wait for children to finish
      putStrLn "Done!"