python-multiprocessingmontecarlo

How can I easily parallelize my Monte Carlo simulations?


I know there are many posts on here about this specific thing, but I am not a programmer by any stretch of the imagination, so I keep getting caught up in the various implementations, and I can't quite piece together a solution for my specific situation. In its most simple form, my simulation looks like this:

import numpy as np
import matplotlib.pyplot as plt

def simulation_trial(a, b, c, d, n):
    rng = np.random.default_rng()
    err = rng.normal(0,1, 1)
    #Do some calculations with err and a,b,c,d
    print("completed trial n")
    plt.plot(some_data[0], some_data[1])
    np.savez("./data", some_data)
    return 0

So I would like to run simulation_trial a few hundred times with the same args (a,b,c,d) where n is just an index to keep track of which trials are which (I don't know if thats actually necessary). The only difference between the trials is a new random number for err. It should also be noted that the function doesn't return anything; it just plots the trial and saves the trial data in a local file. I'm sure that isn't good coding practice, but that's just how it is right now.

At first, I was running something like

for n in range(200):
     simulation_trial(a,b,c,d,n)

but that ends up taking several hours to run. I know this should be an "embarrassingly parallel" task, but I can't seem to piece together how to implement it in Python. My most recent attempt looks like

N = 200

with multiprocessing.Pool(os.cpu_count()) as pool:
    pool.apply_async(simulation_trial, (a, b, c, d, range(N)))

but unsurprisingly, that just outputs a blank plot. I would greatly appreciate any suggestions for implementing multiprocessing in this situation. It doesn't need to be fancy or optimized, I just want to be able to run this simulation in a reasonable amount of time.


Solution

  • the correct translation of

    for n in range(N):
         simulation_trial(a,b,c,d,n)
    

    is

    with multiprocessing.Pool() as pool:
        pool.starmap(simulation_trial, ((a, b, c, d, n) for n in range(N)))
    

    also you need to make each file name different.

    def simulation_trial(a, b, c, d, n):
        rng = np.random.default_rng()
        err = rng.normal(0,1, 1)
        #Do some calculations with err and a,b,c,d
        print(f"completed trial {n}")
        np.savez(f"./data{n}", some_data)
    
    

    lastly drop the plotting in multiprocessing, it won't work as you expect, just read the files later and do the plotting after multiprocessing is done, or return the result array to the main process and plot it there, just not inside your multiprocessed function.