pythonnumpyparallel-processingblasnumpy-ufunc

How do I make np.multiply use more than one core?


The title says it already. I am currently parallelizing my code and a major bottleneck is posed by element-wise multiplication of two three-dimensional ndarrays. My system monitor reveals that only one of the 40 available cores is used for that operation.

I know parallelization works, because the other scipy.fft and BLAS operations run in parallel.

So far, I have not really found any meaningful questions/issues on SO or GitHub. It is a bit bewildering that no one else has had this issue. Am I missing something?

I tried playing with BLAS environment variables and using dgbmv with flattened arrays to achieve the desired behaviour but I have not been successful, yet. A minimal code example would be (with much larger k, 3d arrays, and broadcasting involved in my case):

import numpy as np
k = 1e6
x = np.random.rand(k)
y = np.random.rand(k)
z = np.multiply(x, y)

Solution

  • You can try to have a look at numexpr : https://pypi.org/project/numexpr/2.6.1/

    This lib is supposed to use all your cores.

    You can use it like this :

    import numpy as np
    import numexpr as ne
    
    k = int(1e6)
    x = np.random.rand(k)
    y = np.random.rand(k)
    
    z = ne.evaluate('x * y')