Using multiprocessing with runpy

I have a Python module that uses multiprocessing. I'm executing this module from another script with runpy. However, this results in (1) the module running twice, and (2) the multiprocessing jobs never finish (the script just hangs).

In my minimal working example, I have a script runpy_test.py:

import runpy
runpy.run_module('module_test')

and a directory module_test containing an empty __init__.py and a __main__.py:

from multiprocessing import Pool

print 'start'
def f(x):
    return x*x
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'

When I run runpy_test.py, I get:

start
start

and the script hangs.

If I remove the pool.map call (or if I run __main__.py directly, including the pool.map call), I get:

start
done

I'm running this on Scientific Linux 7.6 in Python 2.7.5.

Solution

Rewrite your __main__.py like so:

from multiprocessing import Pool
from .implementation import f

print 'start'
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'

And then write an implementation.py (you can call this whatever you want) in which your function is defined:

def f(x):
    return x*x

Otherwise you will have the same problem with most interfaces in multiprocessing, and independently of using runpy. As @Weeble explained, when Pool.map tries to load the function f in each sub-process it will import <your_package>.__main__ where your function is defined, but since you have executable code at module-level in __main__ it will be re-executed by the sub-process.

Aside from this technical reason, this is also better design in terms of separation of concerns and testing. Now you can easily import and call (including for test purposes) the function f without running it in parallel.