pythonmultiprocessing

python multiprocess sharing value with Value not working as documented


I'm learning sharing variables in multiprocess. The official doc says

Data can be stored in a shared memory map using Value...

and the example works fine.

But got error when I try to use it with Pool.map:

from multiprocessing import Value, Pool, Manager


def f(args):
    n, = args
    n.value = 1


if __name__ == '__main__':
    n = Value('d', 0.0)
    # n = Manager().Value('d', 0.0) # can workaround the error
    with Pool(1) as pool:
        pool.map(f, [(n,)])
    # RuntimeError: Synchronized objects should only be shared between processes through inheritance
    print(n.value)

trace back

Traceback (most recent call last):
  File "D:\0ly\ly\processvaldoc.py", line 13, in <module>
    pool.map(f, [(n,)])
  File "C:\a3\envs\skl\Lib\multiprocessing\pool.py", line 367, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\a3\envs\skl\Lib\multiprocessing\pool.py", line 774, in get
    raise self._value
  File "C:\a3\envs\skl\Lib\multiprocessing\pool.py", line 540, in _handle_tasks
    put(task)
  File "C:\a3\envs\skl\Lib\multiprocessing\connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\a3\envs\skl\Lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "C:\a3\envs\skl\Lib\multiprocessing\sharedctypes.py", line 199, in __reduce__
    assert_spawning(self)
  File "C:\a3\envs\skl\Lib\multiprocessing\context.py", line 374, in assert_spawning
    raise RuntimeError(
RuntimeError: Synchronized objects should only be shared between processes through inheritance

my python version is 3.12.9 64bit on Win11.

Since there are so many ways to start multiprocess, I have never read all the documents or tried them. Just wonder what's the essential difference of documented Process.start() and Pool.map that leads to the latter failure?

I searched and learned Manager().Value can solve. But what's its magic? since I'm not using with Manager() as style. If Manager().Value works in both(maybe all) scenarios, why need design another multiprocessing.Value which only partly worked?


Solution

  • multiprocessing.Value uses shared memory. and expects it to be passed to child processes at startup. Pool(...) creates the child processes, pool.map only dispatches tasks through a queue.

    Conceptually there is no technical reason it cannot be passed to children after they start, it's just the way it is, and it simplifies all implementations greatly, and maybe improves security.

    Most operating systems allow you to create anonymous private shared memory that only child process can inherit on startup, but you can just use a named shared memory which can be passed after startup, PyTorch shared memory does this.

    you can pass them to the children of Pool using the initializer argument.

    from multiprocessing import Pool, Value
    
    global_var = None
    
    def initializer_func(shared_var):
        global global_var
        global_var = shared_var
    
    def f(args):
        print(f"{args=}", flush=True)
        print(f"{global_var=}", flush=True)
        n = global_var
        n.value, = args
    
    
    if __name__ == '__main__':
        n = Value('d', 0.0)
        # n = Manager().Value('d', 0.0) # can workaround the error
        with Pool(1, initializer=initializer_func, initargs=(n,)) as pool:
            pool.map(f, [(2,)])
        # RuntimeError: Synchronized objects should only be shared between processes through inheritance
        print(f"{n.value=}")
    
    args=(2,)
    global_var=<Synchronized wrapper for c_double(0.0)>
    n.value=2.0
    

    Manager doesn't use shared memory, it is a process that stores the objects locally and uses sockets for communication. it is basically Redis implemented in python to allow RPC calls.

    You spawn a manager process that other processes send packets to it saying get this value and set this value to x. it is much much slower, and the spawned process consumes system resources. but any process can connect to, not necessarily a child process, or even not on the same PC if you set its port to a public one manually. it uses some encryption1 to get around the potential vulnerabilities.


    The biggest limitations of shared memory are, it cannot store pointers, and it cannot dynamically grow2, which means it cannot store generic python objects, it only stores basic types like int and float and str, see ShareableList Types. whereas Manager can handle any object, so it is useful for creating a dictionary or a list of complex python objects.

    1. IMHO Manager doesn't have enough encryption, i wouldn't recommend exposing it to the web, better use grpc + SSL if you want actual security.

    2. shared memory can grow under a few OSes but synchronization of the growth is not practical.