I am making experiments with multiprocessing in Python. I wrote some code that requires concurrent modification of 3 different variables (a dict, a float and an int), shared across the different process. My understanding of the works behind locking tells me that if I have 3 different shared variables, it will be more efficient to assign a lock to each one. After all, why should process 2 wait to modify variable A just because process 1 is modifying variable B? It makes sense to me that if you need to lock variable B, then A should still be accessible to other processes. I run the 2 toy examples below, based on a real program I'm writing, and to my surprise the code runs faster with a single lock!
Single lock: 2.1 seconds
import multiprocessing as mp
import numpy as np
import time
class ToyClass:
def __init__(self, shared_a, shared_b):
self.a = shared_a
self.b = shared_b
def update_a(self, key, n, lock):
with lock:
if key not in self.a:
self.a[key] = np.zeros(4)
self.a[key][n] += 1
def update_b(self, lock):
with lock:
self.b.value = max(0.1, self.b.value - 0.01)
def run_episode(toy, counter, lock):
key = np.random.randint(100)
n = np.random.randint(4)
toy.update_a(key, n, lock)
toy.update_b(lock)
with lock:
counter.value += 1
if __name__ == "__main__":
num_episodes = 1000
num_processes = 4
t0 = time.time()
with mp.Manager() as manager:
shared_a = manager.dict()
shared_b = manager.Value('d', 0)
counter = manager.Value('i', 0)
toy = ToyClass(shared_a=shared_a, shared_b=shared_b)
# Single lock
lock = manager.Lock()
pool = mp.Pool(processes=num_processes)
for _ in range(num_episodes):
pool.apply_async(run_episode, args=(toy, counter, lock))
pool.close()
pool.join()
tf = time.time()
print(f"Time to compute single lock: {tf - t0} seconds")
Multiple locks: 2.85 seconds!!
import multiprocessing as mp
import numpy as np
import time
class ToyClass: ## Same definition as for single lock
def __init__(self, shared_a, shared_b):
self.a = shared_a
self.b = shared_b
def update_a(self, key, n, lock):
with lock:
if key not in self.a:
self.a[key] = np.zeros(4)
self.a[key][n] += 1
def update_b(self, lock):
with lock:
self.b.value = max(0.1, self.b.value - 0.01)
def run_episode(toy, counter, lock_a, lock_b, lock_count):
key = np.random.randint(100)
n = np.random.randint(4)
toy.update_a(key, n, lock_a)
toy.update_b(lock_b)
with lock_count:
counter.value += 1
if __name__ == "__main__":
num_episodes = 1000
num_processes = 4
t0 = time.time()
with mp.Manager() as manager:
shared_a = manager.dict()
shared_b = manager.Value('d', 0)
counter = manager.Value('i', 0)
toy = ToyClass(shared_a=shared_a, shared_b=shared_b)
# 3 locks for 3 shared variables
lock_a = manager.Lock()
lock_b = manager.Lock()
lock_count = manager.Lock()
pool = mp.Pool(processes=num_processes)
for _ in range(num_episodes):
pool.apply_async(run_episode, args=(toy, counter, lock_a, lock_b, lock_count))
pool.close()
pool.join()
tf = time.time()
print(f"Time to compute multi-lock: {tf - t0} seconds")
What am I missing here? Is there a computational overhead when switching between locks that outweighs any potential benefit? These are just flags, how can it be?
Note: I know the code runs much faster when single process/thread, but this is part of an experiment precisely to understand the downsides of multiprocessing.
This has nothing to do with the locking, you are just sending 3 locks per call instead of 1, which is 3 times the transmission overhead.
to verify this you can test
Manager.Value
objects, still the same time as 3 locks.the locking part plays no role in this, you are just sending the locks over and over, which you can avoid by using an initializer when spawning the pool.
lock_a = None
lock_b = None
lock_counter = None
def initialize_locks(val1,val2,val3):
global lock_a, lock_b, lock_counter
lock_a = val1
lock_b = val2
lock_counter = val3
...
pool = mp.Pool(processes=num_processes, initializer=initialize_locks, initargs=(lock_a, lock_b, lock_counter,))
Also if you are using the initializer you should instead use multiprocessing.Lock
instead, as it is faster than Manager.Lock
, same applied to Multiprocessing.Value
instead of Manager.Value