pythonparallel-processinggpupyopenclparticle-filter

Maximum number of parallel processes on a simple CPU/GPU


I am trying to run a particle filter with 3000 independent particles. More specifically, I would like to run 3000 (simple) computations in parallel at the same time, so that the computation time remains short.

This task is designed for experimental applications on a laboratory equipment, so it has to be run on a local laptop. I cannot rely on a distant cluster of computers, and the computers that will be used are unlikely to have fancy Nvidia graphic cards. For instance, the current computer I'm working with has an Intel Core i7-8650U CPU and an Intel UHD Graphics 620 GPU.

Using the mp.cpu_count() from the multiprocessing Python library tells me that I have 8 processors, which is too few for my problem (I need to run several thousands of processes in parallel). I thus looked towards GPU-based solutions, and especially at PyOpenCL. The Intel UHD Graphics 620 GPU is supposed to have only 24 processors, does it mean I can only use it to run 24 processes at the same time in parallel ?

More generally, is my problem (running 3000 processes in parallel on a simple laptop using Python) realistic, and if yes which software solution would you recommend ?

EDIT

Here is my pseudo code. At each time step i, I am calling the function posterior_update. This function uses 3000 times and independently (once for each particle) the function approx_likelihood, which seems hardly vectorizable. Ideally, I would like these 3000 calls to take place independently and in parallel.

import numpy as np
import scipy.stats
from collections import Counter
import random
import matplotlib.pyplot as plt
import os
import time

# User's inputs ##############################################################

# Numbers of particles
M_out           = 3000

# Defines a bunch of functions ###############################################

def approx_likelihood(i,j,theta_bar,N_range,q_range,sigma_range,e,xi,M_in):
    
    return sum(scipy.stats.norm.pdf(e[i],loc=q_range[theta_bar[j,2]]*kk,scale=sigma_range[theta_bar[j,3]])* \
          xi[nn,kk]/M_in for kk in range(int(N_range[theta_bar[j,0]]+1)) for nn in range(int(N_range[theta_bar[j,0]]+1)))
    
def posterior_update(i,T,e,M_out,M_in,theta,N_range,p_range,q_range,sigma_range,tau_range,X,delta_t,ML):
         
    theta_bar = np.zeros([M_out,5], dtype=int)
    x_bar = np.zeros([M_out,M_in,2], dtype=int)
    u = np.zeros(M_out)
    x_tilde = np.zeros([M_out,M_in,2], dtype=int)    
    w = np.zeros(M_out)
    
    # Loop over the outer particles 
    for j in range(M_out):
                    
        # Computes the approximate likelihood u
        u[j] = approx_likelihood(i,j,theta_bar,N_range,q_range,sigma_range,e,xi,M_in)
    
    ML[i,:] = theta_bar[np.argmax(u),:]        
    # Compute the normalized weights w
    w = u/sum(u)
    # Resample
    X[i,:,:,:],theta[i,:,:] = resample(M_out,w,x_tilde,theta_bar)  
       
    return X, theta, ML

# Loop over time #############################################################
    
for i in range(T):
    
    print('Progress {0}%'.format(round((i/T)*100,1)))
        
    X, theta, ML = posterior_update(i,T,e,M_out,M_in,theta,N_range,p_range,q_range,sigma_range,tau_range,X,delta_t,ML)

Solution

  • These are some ideas, not an answer to your question:

    import multiprocessing as mp
    
    def f(j):
        return approx_likelihood(i, j, theta_bar, N_range, q_range, sigma_range, e, xi, M_in)
    
    with mp.Pool() as pool:
        u = pool.map(f, range(M_out), chunksize=50)