I have two dense matrices with the sizes (2500, 208) and (208, 2500). I want to calculate their product. It works fine and fast when it is a single process but when it is in a multiprocessing block, the processes stuck in there for hours. I do sparse matrices multiplication with even larger sizes but I have no problem. My code looks like this:
with Pool(processes=agents) as pool:
result = pool.starmap(run_func, args)
def run_func(args):
#Do stuff. Including large sparse matrices multiplication.
C = np.matmul(A,B) # or A.dot(B) or even using BLASS library directly dgemm(1, A, B)
#Never go after the line above!
Note that when the function run_func
is executed in a single process, then it works fine. When I do multiprocessing on my local machine, it works fine. When I go for a multiprocessing on HPC, it stucks. I allocate my resources like this:
srun -v --nodes=1 --time 7-0:0 --cpus-per-task=2 --nodes=1 --mem-per-cpu=20G python3 -u run.py 2
Where the last parameter is the number of agents
in the code above. Here is the LAPACK library details supported on the HPC (obtained from numpy):
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['**/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['**/include']
Compared to my local machine, all python packages and python version on HPC are the same. Any leads on what is going on?
As a workaround, I tried multithreading instead of multiprocessing and the issue is resolved now. I am not sure what the problem behind multiprocessing though.