python-3.xmultiprocessingnumpy-random

np.random.choice conflict with multiprocessing? multiprocessing inside for loop?


I want to use np.random.choice inside a multiprocessing pool, but I get the IndexError: list index out of range. I don't get any error when I use the choice function inside a for loop (in series then, not parallel). Any ideas on how to overcome this? This is just a small part of my routine but would surely improve a lot its speed. My code is like below. I declare X before anything else in the routine so it works as a global variable, but it's dynamically populated inside the main. I also noticed that there is some conflict with multiprocessing and the for loop. Any ideas on how to implement this?

from multiprocessing import Pool
from numpy.random import choice
import numpy as np

K = 10
X = []

def function(k):
   
   global X
   np.random.RandomState(k)

   aux = [i for i in np.arange(K) if i != k]
   a,b,c = choice(aux,3,replace=False)
   x = X[a]+0.7*(X[b]-X[c])
   return x

if __name__ == '__main__':
    
    X = np.arange(K)

    for n in range(K):
    
        pool = Pool(K)
        w = pool.map(function,np.arange(K))
        pool.close()
    
        print(w)


Solution

  • Child processes do not share the memory space of parent processes. Since you populate X inside the if __name__ ... clause, the child processes only have access to the X defined at the top module, i.e X = []

    A quick solution would be to shift the line X = np.arange(K) outside the clause like below:

    from multiprocessing import Pool
    from numpy.random import choice
    import numpy as np
    
    K = 10
    X = []
    X = np.arange(K)
    
    
    def function(k):
        global X
        np.random.RandomState(k)
    
        aux = [i for i in np.arange(K) if i != k]
        a, b, c = choice(aux, 3, replace=False)
        x = X[a] + 0.7 * (X[b] - X[c])
        return k, x
    
    
    if __name__ == '__main__':
        pool = Pool(10)
        w = pool.map(function, np.arange(K))
        pool.close()
    
        print(w)
    

    Output

    [(0, 10.899999999999999), (1, 9.4), (2, 5.7), (3, 7.4), (4, 1.1000000000000005), (5, -1.0999999999999996), (6, 5.6), (7, 3.8), (8, 5.5), (9, -4.8999999999999995)]
    

    If you do not want to initialize X for all child processes (memory constraints?), you can use a manager to store X that can be shared to processes without having to copy it for every child. To pass more than one argument to the child processes, you will also have to use pool.starmap instead. Lastly, delete that global X, it is not doing anything useful since global is only used if you are planning to modify a global variable from a local scope.

    from multiprocessing import Pool, Manager
    from numpy.random import choice
    import numpy as np
    
    K = 10
    
    
    def function(X, k):
    
        np.random.RandomState(k)
    
        aux = [i for i in np.arange(K) if i != k]
        a, b, c = choice(aux, 3, replace=False)
        x = X[a] + 0.7 * (X[b] - X[c])
        return k, x
    
    
    if __name__ == '__main__':
        m = Manager()
        X = m.list(np.arange(K))
    
        pool = Pool(10)
        args = [(X, val) for val in np.arange(K)]
    
        w = pool.starmap(function, args)
        pool.close()
    
        print(w)
    

    Output

    [(0, -1.5999999999999996), (1, 7.3), (2, 4.9), (3, 1.9000000000000004), (4, 5.5), (5, -1.0999999999999996), (6, 4.800000000000001), (7, 7.3), (8, 0.10000000000000053), (9, 4.7)]