In Parallel and Concurrent Programming in Haskell by Simon Marlow, at pages 133 and 134 the following code is shown:
type Name = String
type PhoneNumber = String
type PhoneBook = Map Name PhoneNumber
newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
lookup (PhoneBookState m) name = do
book <- takeMVar m
putMVar m book
return (Map.lookup name book)
and next page a few observations are made. The final part of the second paragraph puzzles me a bit though. Here's the first half of the paragraph:
Using immutable data structures in a mutable wrapper has further benefits. Note that in the
lookup
operation, we simply grabbed the current value of the state and then the complexMap.lookup
operation takes place outside of thetakeMVar
/putMVar
sequence. This is good for concurrency, because it means the lock is held only for a very short time.
This is fairly clear to me, in the sense that I understand that takeMVar
puts the lock, and putMVar
immediately releases it, so that following concurrent accesses to the PhoneBookState
are not hindered any further. At the same time, I also see that further mutations of the MVar
are possible, so by the time lookup
returns with a Just smth
, that (name, smth)
pair might have been removed from the map (or if it returned Nothing
, some (name, smth)
might have been added to the map).
However, I don't get what the following part of the paragraph tries to convey:
This is possible only because the value of the state is immutable. If the data structure were mutable, we would have to hold the lock while operating on it.⁵
⁵. The other option is to use a lock-free algorithm, which is enormously complex and difficult to get right.
What does that mean? Is the author referring to the case that Map
exposed some API to mutated? That's not even possible, unless we are in IO
, right? Or is he referring to mutations performed on m
's content (by other threads after putMVar
has released the lock) somehow affecting book
's conent?
Here's a full example that may help illustrate the problem. I'm sure you'll understand if I don't want to write an entire mutable map implementation to answer this question, but consider the following simplified mutable map that is only capable of storing one key-value pair. It supports "new", "lookup", and "replace" operations. (I named it "replace" instead of "insert" because the map can only hold one key-value pair, so the previous entry gets lost when a new one is inserted.)
data MMap k v = MMap (IORef k) (IORef v)
mmapNew :: k -> v -> IO (MMap k v)
mmapNew k v = MMap <$> newIORef k <*> newIORef v
mmapReplace :: k -> v -> MMap k v -> IO ()
mmapReplace k v (MMap k_ v_) = do
writeIORef k_ k
writeIORef v_ v
mmapLookup :: (Eq k) => k -> MMap k v -> IO (Maybe v)
mmapLookup k (MMap k'_ v_) = do
k' <- readIORef k'_
if k == k'
then Just <$> readIORef v_
else pure Nothing
Now, suppose we want to write a thread-safe, concurrent interface to a phone book using this mutable data structure. The revised code for lookup
is more or less the same as the original:
type Name = String
type PhoneNumber = String
type PhoneBook = MMap Name PhoneNumber
newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
lookup (PhoneBookState m) name = do
book <- takeMVar m
putMVar m book
mmapLookup name book -- no `return`, because this is an IO operation
where we perform the actual mmapLookup
after unlocking the state.
The corresponding API code for new
and replace
would presumably be something like this:
new :: Name -> PhoneNumber -> IO PhoneBookState
new name number = do
book <- mmapNew name number
PhoneBookState <$> newMVar book
replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
replace (PhoneBookState m) name number = do
book <- takeMVar m
mmapReplace name number book
putMVar m book
Note that, unlike with lookup
, we've taken a conservative approach with replace
, holding the lock throughout the replacement operation.
However, this implementation has a race condition for the lookup
/replace
pair. To see why, let's add a delay to mmapLookup
:
mmapLookup :: (Eq k) => k -> MMap k v -> IO (Maybe v)
mmapLookup k (MMap k'_ v_) = do
k' <- readIORef k'_
threadDelay 2000000
if k == k'
then Just <$> readIORef v_
else pure Nothing
and consider the following main
function, which creates a phone book for "Alice"'s phone number, and then launches two threads -- one to look up "Alice" and another to replace the entry with "Bob"'s number:
main = do
pbs <- new "Alice" "555-1234"
forkIO $ do
aliceno <- lookup pbs "Alice"
putStrLn $ "Alice's number is " ++ show aliceno
threadDelay 1000000
forkIO $ replace pbs "Bob" "555-9876"
threadDelay 5000000
putStrLn "Done!"
When this program runs, it incorrectly retrieves Bob's number in the lookup for Alice:
Alice's number is Just "555-9876"
Done!
Obviously, the problem is that the replacement operation takes place between the comparison of the keys and the retrieval of the value.
If you modify lookup
to release the lock only after the lookup completes:
lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
lookup (PhoneBookState m) name = do
book <- takeMVar m
number <- mmapLookup name book
putMVar m book
return number
then there will be no race condition. No matter how you interleave the operations (and no matter how you carefully place delays), you will either retrieve Alice's number of Just "555-1234"
from the original phone book or fail to retrieve Alice's number and return Nothing
from the revised phone book (where it's been replaced by Bob's number).
So, for a mutable MMap
, we have to hold the lock around the mmapLookup
operation to safely operate on the data structure without allowing any other threads to modify the data structure out from under us while we are performing the lookup
.
In contrast, if we were to use an analogous immutable data structure:
data IMap k v = IMap k v
imapNew :: k -> v -> IMap k v
imapNew k v = IMap k v
imapReplace :: k -> v -> IMap k v -> IMap k v
imapReplace k v _ = IMap k v
imapLookup :: (Eq k) => k -> IMap k v -> Maybe v
imapLookup k (IMap k' v) | k == k' = Just v
| otherwise = Nothing
type Name = String
type PhoneNumber = String
type PhoneBook = IMap Name PhoneNumber
newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
then the corresponding implementations of new
, lookup
, and replace
would be:
new :: Name -> PhoneNumber -> IO PhoneBookState
new name number = PhoneBookState <$> newMVar (imapNew name number)
lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
lookup (PhoneBookState m) name = do
book <- takeMVar m
putMVar m book
return $ imapLookup name book
replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
replace (PhoneBookState m) name number = do
book <- takeMVar m
let book' = imapReplace name number book
putMVar m book'
Critically, when replace
takes book
out of the MVar
, which might very well be the same book
concurrently being accessed by the imapLookup
call in lookup
, it does not mutate book
itself but rather creates a new book'
via the imapReplace
call without affecting the original book
. This is the essence of immutable data structures, and it's why the imapLookup
operation can safety continue to run on the unmodified book
even as replace
is creating a new book'
and writing it back into the vacant MVar
.
Even if we went out of our way to rewrite imapLookup
as an IO
operation with ample opportunity for another thread to run between the key and value access:
imapLookup :: (Eq k) => k -> IMap k v -> IO (Maybe v)
imapLookup k m = do
k' <- evaluate $ (\(IMap k v) -> k) m -- force retrieval of the key
threadDelay 1000000 -- allow another thread to run
v' <- evaluate $ (\(IMap k v) -> v) m -- fetch the value
pure $ if k == k' then Just v' else Nothing
we still wouldn't trigger a race condition, because the book :: IMap k v
being accessed by imapLookup name book
is immutable, regardless of how it is being accessed in other threads.
Here's the full example illustrating the race condition with a mutable MMap
structure:
import Prelude hiding (lookup)
import Data.IORef
import Control.Concurrent
data MMap k v = MMap (IORef k) (IORef v)
mmapNew :: k -> v -> IO (MMap k v)
mmapNew k v = MMap <$> newIORef k <*> newIORef v
mmapReplace :: k -> v -> MMap k v -> IO ()
mmapReplace k v (MMap k_ v_) = do
writeIORef k_ k
writeIORef v_ v
mmapLookup :: (Eq k) => k -> MMap k v -> IO (Maybe v)
mmapLookup k (MMap k'_ v_) = do
k' <- readIORef k'_
threadDelay 2000000 -- allow another thread to do mischief
if k == k'
then Just <$> readIORef v_
else pure Nothing
type Name = String
type PhoneNumber = String
type PhoneBook = MMap Name PhoneNumber
newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
new :: Name -> PhoneNumber -> IO PhoneBookState
new name number = do
book <- mmapNew name number
PhoneBookState <$> newMVar book
lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
lookup (PhoneBookState m) name = do
book <- takeMVar m
putMVar m book
mmapLookup name book
replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
replace (PhoneBookState m) name number = do
book <- takeMVar m
mmapReplace name number book
putMVar m book
main = do
pbs <- new "Alice" "555-1234"
forkIO $ do
aliceno <- lookup pbs "Alice"
putStrLn $ "Alice's number is " ++ show aliceno
threadDelay 1000000
forkIO $ replace pbs "Bob" "555-9876"
threadDelay 2000000 -- wait for children to finish
putStrLn "Done!"
and the immutable version without the race condition:
module ImmutableMap where
import qualified Data.Map as Map
import Prelude hiding (lookup)
import Control.Concurrent
import Control.Exception
data IMap k v = IMap k v
imapNew :: k -> v -> IMap k v
imapNew k v = IMap k v
imapReplace :: k -> v -> IMap k v -> IMap k v
imapReplace k v _ = IMap k v
imapLookup :: (Eq k) => k -> IMap k v -> IO (Maybe v)
imapLookup k m = do
k' <- evaluate $ (\(IMap k v) -> k) m -- force retrieval of the key
threadDelay 1000000 -- allow another thread to run
v' <- evaluate $ (\(IMap k v) -> v) m -- fetch the value
pure $ if k == k' then Just v' else Nothing
type Name = String
type PhoneNumber = String
type PhoneBook = IMap Name PhoneNumber
newtype PhoneBookState = PhoneBookState (MVar PhoneBook)
new :: Name -> PhoneNumber -> IO PhoneBookState
new name number = PhoneBookState <$> newMVar (imapNew name number)
lookup :: PhoneBookState -> Name -> IO (Maybe PhoneNumber)
lookup (PhoneBookState m) name = do
book <- takeMVar m
putMVar m book
imapLookup name book
replace :: PhoneBookState -> Name -> PhoneNumber -> IO ()
replace (PhoneBookState m) name number = do
book <- takeMVar m
let book' = imapReplace name number book
putMVar m book'
main = do
pbs <- new "Alice" "555-1234"
forkIO $ do
aliceno <- lookup pbs "Alice"
putStrLn $ "Alice's number is " ++ show aliceno
threadDelay 1000000
forkIO $ replace pbs "Bob" "555-9876"
threadDelay 2000000 -- wait for children to finish
putStrLn "Done!"