multiprocessingpoolpython-3.12ubuntu-24.04

How does multiprocessing.Pool() create its child processes?


Overview:

I am trying to use a Pool internally in a module that is not __main__ and make it invisible to main that this pool exists.

Because of this, if __name__ == "__main__": protection is not applicable.

When I start my Pool at the top level of the module, I get an AttributeError for functions declared afterwards at the top level of the same module. See the minimal reproducible example at the end.

This issue has been mentioned in Python multiprocessing.Pool: AttributeError where the answer states:

You are starting the pool before you define your function and classes, that way the child processes cannot inherit any code.

What I found out

From my understanding, the child process is created in module __main__ at import time and then copies everything from the imported module up to the pool definition.

I understand that, with this approach, the pool is going to be started in my child process as well. However, this does not seem to be the reason for the AttributeError, because moving the function definition before the start of the pool does not throw any errors.

My Question

How does the Pool exactly initialize these child processes and why was it implemented this way? What errors does this prevent compared to importing the whole test module into the child process?

Minimal Example:

test.py

from multiprocessing import Pool

_pool = Pool(2)

def script_wrapper():
    print("foo")

def script_runner():
    _pool.apply_async(script_wrapper)
    _pool.close()
    _pool.join()

main.py

from test import script_runner

if __name__ == "__main__":
    script_runner()

Error when running main.py

Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 114, in worker
    task = get()
           ^^^^^
  File "/usr/lib/python3.12/multiprocessing/queues.py", line 389, in get
    return _ForkingPickler.loads(res)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'script_wrapper' on <module 'test' from '/home/user/Documents/code/multiproc_error/test.py'>

Solution

  • multiprocessing.Pool makes processes by calling multiprocessing.Process. It has all the same limitations as calling multiprocessing.Process manually, including the if __name__ == '__main__' thing - you're not going to get around that limitation. But that limitation is mostly important on Windows, and you've actually got an if __name__ == '__main__' guard anyway. That's not what's causing the current problem.

    You're using the default startmethod, and you appear to be on a platform where the default is fork. That means that your child processes will be created by forking your main process. At the time of creation, the child processes will be in a state mostly identical to that of the parent process.

    In particular, you create the child processes in the middle of importing test, so the child processes will also be in the middle of importing test. The child processes will then start executing pool worker logic. They will never get a chance to finish importing test. No test initialization code after the Pool(2) call will execute, so your functions don't get defined.

    None of this is specific to Pool - if you had made worker processes with two manual calls to Process, the workers still wouldn't see anything defined later.