My actual problem is quite lengthy, and I believe it could benefit from multi-processing. The crux of the problem is as follows: I have some multiprocessing function that takes in two values (x , y) outputs a single number Q. For illustration:
def multiprocessing_func(x , y):
Q = x*y
(The actual function is much more complicated and involves running a simulation for input parameters x and y) I have two arrays of x and y values e.g.:
x = np.linspace(0 , 1 , 10)
y = np.linspace(0 , 1 , 10)
I would like to compile the values of Q from multiprocessing_func
into a matrix Q_matrix
:
import multiprocessing
if __name__ == '__main__':
processes = []
for m in range(len(x)):
for n in range(len(y)):
p = multiprocessing.Process(target = multiprocessing_func , args=(x[m] , y[n]))
processes.append(p)
p.start()
for process in processes:
process.join()
So far my attempts have involved using return_dict
in my multiprocessing functions. The return_dict
simply compiles all the return values in a list. However, of course, that gives the wrong dimensionality. Essentially, I am wondering if there is a multiprocessing equivalent of this set-up:
x = np.linspace(0 , 1 , 10)
y = np.linspace(0 , 1 , 10)
Q_matrix = np.zeros(shape = (len(x) , len(y)))
for m in range(len(x)):
for n in range(len(y)):
Q_matrix[m , n] = x[m]*y[n]
I am sure there is a simple solution to this, but I am quite new to multi-processing so any help is greatly appreciated.
There is an overhead in creating subprocesses and passing arguments to the processes that "live" in a different address space. So unless your worker function, multiprocessing_func
is considerably more CPU-intensive than what you currently have to make this additional overhead worth the cost of using multiprocessing, you are better off not using it. But this is how you could using a multiprocessing pool whose size is limited by either the number of tasks you have to submit or the number of CPU cores you have.
from concurrent.futures import ProcessPoolExecutor, as_completed
import numpy as np
import os
def multiprocessing_func(x, y):
return x * y
if __name__ == '__main__':
x = np.linspace(0 , 1 , 10)
y = np.linspace(0 , 1 , 10)
Q_matrix = np.zeros(shape = (len(x) , len(y)))
pool_size = min(os.cpu_count(), len(x) * len(y))
with ProcessPoolExecutor(max_workers=pool_size) as executor:
# create mapping between the Future instance returned by submit and the original m, n indexes:
futures = {executor.submit(multiprocessing_func, x[m], y[n]): (m, n) for m in range(len(x)) for n in range(len(y))}
for future in as_completed(futures):
m, n = futures[future] # original indexes
result = future.result()
Q_matrix[m][n] = result
print(Q_matrix)
Prints:
[[0. 0. 0. 0. 0. 0.
0. 0. 0. 0. ]
[0. 0.01234568 0.02469136 0.03703704 0.04938272 0.0617284
0.07407407 0.08641975 0.09876543 0.11111111]
[0. 0.02469136 0.04938272 0.07407407 0.09876543 0.12345679
0.14814815 0.17283951 0.19753086 0.22222222]
[0. 0.03703704 0.07407407 0.11111111 0.14814815 0.18518519
0.22222222 0.25925926 0.2962963 0.33333333]
[0. 0.04938272 0.09876543 0.14814815 0.19753086 0.24691358
0.2962963 0.34567901 0.39506173 0.44444444]
[0. 0.0617284 0.12345679 0.18518519 0.24691358 0.30864198
0.37037037 0.43209877 0.49382716 0.55555556]
[0. 0.07407407 0.14814815 0.22222222 0.2962963 0.37037037
0.44444444 0.51851852 0.59259259 0.66666667]
[0. 0.08641975 0.17283951 0.25925926 0.34567901 0.43209877
0.51851852 0.60493827 0.69135802 0.77777778]
[0. 0.09876543 0.19753086 0.2962963 0.39506173 0.49382716
0.59259259 0.69135802 0.79012346 0.88888889]
[0. 0.11111111 0.22222222 0.33333333 0.44444444 0.55555556
0.66666667 0.77777778 0.88888889 1. ]]