I'm learning sharing variables in multiprocess. The official doc says
Data can be stored in a shared memory map using Value...
and the example works fine.
But got error when I try to use it with Pool.map
:
from multiprocessing import Value, Pool, Manager
def f(args):
n, = args
n.value = 1
if __name__ == '__main__':
n = Value('d', 0.0)
# n = Manager().Value('d', 0.0) # can workaround the error
with Pool(1) as pool:
pool.map(f, [(n,)])
# RuntimeError: Synchronized objects should only be shared between processes through inheritance
print(n.value)
trace back
Traceback (most recent call last):
File "D:\0ly\ly\processvaldoc.py", line 13, in <module>
pool.map(f, [(n,)])
File "C:\a3\envs\skl\Lib\multiprocessing\pool.py", line 367, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\a3\envs\skl\Lib\multiprocessing\pool.py", line 774, in get
raise self._value
File "C:\a3\envs\skl\Lib\multiprocessing\pool.py", line 540, in _handle_tasks
put(task)
File "C:\a3\envs\skl\Lib\multiprocessing\connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\a3\envs\skl\Lib\multiprocessing\reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
File "C:\a3\envs\skl\Lib\multiprocessing\sharedctypes.py", line 199, in __reduce__
assert_spawning(self)
File "C:\a3\envs\skl\Lib\multiprocessing\context.py", line 374, in assert_spawning
raise RuntimeError(
RuntimeError: Synchronized objects should only be shared between processes through inheritance
my python version is 3.12.9 64bit on Win11.
Since there are so many ways to start multiprocess, I have never read all the documents or tried them. Just wonder what's the essential difference of documented Process.start()
and Pool.map
that leads to the latter failure?
I searched and learned Manager().Value
can solve. But what's its magic? since I'm not using with Manager() as
style. If Manager().Value
works in both(maybe all) scenarios, why need design another multiprocessing.Value
which only partly worked?
multiprocessing.Value
uses shared memory. and expects it to be passed to child processes at startup. Pool(...)
creates the child processes, pool.map
only dispatches tasks through a queue.
Conceptually there is no technical reason it cannot be passed to children after they start, it's just the way it is, and it simplifies all implementations greatly, and maybe improves security.
Most operating systems allow you to create anonymous private shared memory that only child process can inherit on startup, but you can just use a named shared memory which can be passed after startup, PyTorch shared memory does this.
you can pass them to the children of Pool
using the initializer
argument.
from multiprocessing import Pool, Value
global_var = None
def initializer_func(shared_var):
global global_var
global_var = shared_var
def f(args):
print(f"{args=}", flush=True)
print(f"{global_var=}", flush=True)
n = global_var
n.value, = args
if __name__ == '__main__':
n = Value('d', 0.0)
# n = Manager().Value('d', 0.0) # can workaround the error
with Pool(1, initializer=initializer_func, initargs=(n,)) as pool:
pool.map(f, [(2,)])
# RuntimeError: Synchronized objects should only be shared between processes through inheritance
print(f"{n.value=}")
args=(2,)
global_var=<Synchronized wrapper for c_double(0.0)>
n.value=2.0
Manager doesn't use shared memory, it is a process that stores the objects locally and uses sockets for communication. it is basically Redis implemented in python to allow RPC calls.
You spawn a manager process that other processes send packets to it saying get this value
and set this value to x
. it is much much slower, and the spawned process consumes system resources. but any process can connect to, not necessarily a child process, or even not on the same PC if you set its port to a public one manually. it uses some encryption1 to get around the potential vulnerabilities.
The biggest limitations of shared memory are, it cannot store pointers, and it cannot dynamically grow2, which means it cannot store generic python objects, it only stores basic types like int
and float
and str
, see ShareableList Types. whereas Manager
can handle any object, so it is useful for creating a dictionary or a list of complex python objects.
1. IMHO Manager doesn't have enough encryption, i wouldn't recommend exposing it to the web, better use grpc + SSL if you want actual security.
2. shared memory can grow under a few OSes but synchronization of the growth is not practical.