On a standard installation of python (e.g. via miniconda), I run this script (also pasted below) and I get the following output:
python test_python_multiprocessing.py
arg1: called directly
sys.executable: C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
-----
arg1: called via multiprocessing
sys.executable: C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
-----
The two exes:
C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
This is what I expect.
When I run the same script from a python from a virtual environment whose base is a python embedded in a scriptable application, I get the following result:
arg1: called directly
sys.executable: C:\[virtual_environment_path]\Scripts\python.exe
-----
arg1: called via multiprocessing
sys.executable: C:\[application_with_python]\contrib\Python37\python.exe
-----
The two exes:
C:\[virtual_environment_path]\Scripts\python.exe
C:\[application_with_python]\contrib\Python37\python.exe
Traceback (most recent call last):
File ".\test_python_multiprocessing.py", line 67, in <module>
test_exes()
File ".\test_python_multiprocessing.py", line 64, in test_exes
assert exe1 == exe2
AssertionError
Crucially, the child sys.executable
does not match the parent sys.executable
, but instead matches the base of the parent.
I suspected that the python that ships with the application had been altered, perhaps to have the spawned process point to a hard-coded python path.
I have taken a look at the python standard libraries that ship with the application, and I do not find any discrepancy that explains this difference in behavior.
I tried manually setting the executable to what the default should be before multiprocessing.Process
with multiprocessing.set_executable(sys.executable)
or multiprocessing.get_context("spawn").set_executable(sys.executable)
. These do not have an effect.
What are possible explanations for the difference in behavior between a standard python installation, and this python that is embedded within a scriptable application? How can I investigate the cause, and force the correct python executable to be used when spawning child processes?
test_python_multiprocessing.py
:
import multiprocessing
def functionality(arg1):
import sys
print("arg1: " + str(arg1))
print("sys.executable: " + str(sys.executable))
print("-----\n")
return sys.executable
def worker(queue, arg1):
import traceback
try:
retval = functionality(arg1)
queue.put({"retval": retval})
except Exception as e:
queue.put({"exception": e, "traceback_str": traceback.format_exc()})
raise
finally:
pass
def spawn_worker(arg1):
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue, arg1,))
p.start()
p.join()
err_or_ret = queue.get()
handle_worker_err(err_or_ret)
if p.exitcode != 0:
raise RuntimeError("Subprocess failed with code " + str(p.exitcode) + ", but no exception was thrown.")
return err_or_ret["retval"]
def handle_worker_err(err_or_ret):
if "retval" in err_or_ret:
return None
err = err_or_ret
#import traceback
if (err is not None):
#traceback.print_tb(err["traceback"]) # TODO use e.g. tblib to get traceback
print("The exception was thrown in the child process, reraised in parent process:")
print(err["traceback_str"])
raise err["exception"]
def test_exes():
exe1 = functionality("called directly")
exe2 = spawn_worker("called via multiprocessing")
print("The two exes:")
print(exe1)
print(exe2)
assert exe1 == exe2
if __name__ == "__main__":
test_exes()
[EDIT] the fact that I detected the issue on a python embedded in the scriptable application is a red-herring. Making a virtual environment with a "standard install" Python 3.7.4 base also has the same issue.
long story short, using the "virtual" interpreter causes bugs in multiprocessing and the developers decided to redirect virtualenv environments to the base one. link to issue 35797
and this is pulled from popen_spawn_win32.py
# bpo-35797: When running in a venv, we bypass the redirect
# executor and launch our base Python.
one solution is to use subprocess
instead, and connect to your "pipes" through a socket to a manager instead of using multiprocessing, you can see how to connect to a manager using a socket in BaseManager documentation, python makes it as simple as plugging in its port number.
you can also try pathos as its multiprocessing implementation is "different", (i think its pools use sockets, but i didn't dig in it before and it has other problems from the way it spawns new workers differently, but it can work in a few weird environments where multiprocessing fails.)
Edit: another nice parallelizing alternative that actually uses sockets is Dask, but you have to start the workers separately, not through its built-in pool.