python multiprocessing virtualenv python-embedding

Python `multiprocessing` spawned process using base Python, not virtualenv Python

On a standard installation of python (e.g. via miniconda), I run this script (also pasted below) and I get the following output:

python test_python_multiprocessing.py
arg1: called directly
sys.executable: C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
-----

arg1: called via multiprocessing
sys.executable: C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
-----

The two exes:
C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe
C:\ProgramData\Miniconda3\envs\python3_7_4\python.exe

This is what I expect.

When I run the same script from a python from a virtual environment whose base is a python embedded in a scriptable application, I get the following result:

arg1: called directly
sys.executable: C:\[virtual_environment_path]\Scripts\python.exe
-----

arg1: called via multiprocessing
sys.executable: C:\[application_with_python]\contrib\Python37\python.exe
-----

The two exes:
C:\[virtual_environment_path]\Scripts\python.exe
C:\[application_with_python]\contrib\Python37\python.exe
Traceback (most recent call last):
  File ".\test_python_multiprocessing.py", line 67, in <module>
    test_exes()
  File ".\test_python_multiprocessing.py", line 64, in test_exes
    assert exe1 == exe2
AssertionError

Crucially, the child sys.executable does not match the parent sys.executable, but instead matches the base of the parent.

I suspected that the python that ships with the application had been altered, perhaps to have the spawned process point to a hard-coded python path.

I have taken a look at the python standard libraries that ship with the application, and I do not find any discrepancy that explains this difference in behavior.
I tried manually setting the executable to what the default should be before multiprocessing.Process with multiprocessing.set_executable(sys.executable) or multiprocessing.get_context("spawn").set_executable(sys.executable). These do not have an effect.

What are possible explanations for the difference in behavior between a standard python installation, and this python that is embedded within a scriptable application? How can I investigate the cause, and force the correct python executable to be used when spawning child processes?

test_python_multiprocessing.py:

import multiprocessing

def functionality(arg1):
    import sys
    print("arg1: " + str(arg1))
    print("sys.executable: " + str(sys.executable))
    print("-----\n")
    return sys.executable

def worker(queue, arg1):
    import traceback
    try:
        retval = functionality(arg1)
        queue.put({"retval": retval})
    except Exception as e:
        queue.put({"exception": e, "traceback_str": traceback.format_exc()})
        raise
    finally:
        pass

def spawn_worker(arg1):
    queue = multiprocessing.Queue()
    p = multiprocessing.Process(target=worker, args=(queue, arg1,))
    p.start()
    p.join()
    
    err_or_ret = queue.get()

    handle_worker_err(err_or_ret)
    if p.exitcode != 0:
        raise RuntimeError("Subprocess failed with code " + str(p.exitcode) + ", but no exception was thrown.")
    return err_or_ret["retval"]

def handle_worker_err(err_or_ret):
    if "retval" in err_or_ret:
        return None
    err = err_or_ret
    #import traceback
    if (err is not None):
        #traceback.print_tb(err["traceback"]) # TODO use e.g. tblib to get traceback
        print("The exception was thrown in the child process, reraised in parent process:")
        print(err["traceback_str"])
        raise err["exception"]

def test_exes():
    exe1 = functionality("called directly")
    exe2 = spawn_worker("called via multiprocessing")

    print("The two exes:")
    print(exe1)
    print(exe2)

    assert exe1 == exe2

if __name__ == "__main__":
    test_exes()

[EDIT] the fact that I detected the issue on a python embedded in the scriptable application is a red-herring. Making a virtual environment with a "standard install" Python 3.7.4 base also has the same issue.

Solution

long story short, using the "virtual" interpreter causes bugs in multiprocessing and the developers decided to redirect virtualenv environments to the base one. link to issue 35797

and this is pulled from popen_spawn_win32.py

# bpo-35797: When running in a venv, we bypass the redirect
# executor and launch our base Python.

one solution is to use subprocess instead, and connect to your "pipes" through a socket to a manager instead of using multiprocessing, you can see how to connect to a manager using a socket in BaseManager documentation, python makes it as simple as plugging in its port number.

you can also try pathos as its multiprocessing implementation is "different", (i think its pools use sockets, but i didn't dig in it before and it has other problems from the way it spawns new workers differently, but it can work in a few weird environments where multiprocessing fails.)

Edit: another nice parallelizing alternative that actually uses sockets is Dask, but you have to start the workers separately, not through its built-in pool.