I am working on a dual-processor windows machine and am trying to run several independent python processes using the multiprocessing library. Of course, I am aiming to maximize the use of both CPU's in order to speed up computation time. The details of my machine are below:
I execute a master-script using Python 3.6, which then spawns 72 memory-independent workers using the multiprocessing library. Initially, all 72 cores of my machine are used at 100%. After about 5-10 minutes, however, all 36 of the cores on my second CPU reduce to 0% usage, while the 36 cores on the first CPU remain at 100%. I can't figure out why this is happening.
Is there something I am missing regarding the utilization of both CPU's in a dual-processor Windows machine? How can I ensure that the full potential of my machine is utilized? As a side note, I'm curious if this would be different if I were using a Linux OS? Thank you in advance for anyone who is willing to help with this.
A representation of my python master script is below:
import pandas as pd
import netCDF4 as nc
from multiprocessing import Pool
WEATHERDATAPATH = "C:/Users/..../weatherdata/weatherfile_%s.nc4"
OUTPUTPATH = "C:/Users/....outputs/result_%s.nc4"
def calculationFunction(year):
dataset = nc.Dataset(WEATHERDATAPATH%year)
# Read the data
data1 = dataset["windspeed"][:]
data2 = dataset["pressure"][:]
data3 = dataset["temperature"][:]
timeindex = nc.num2date(dataset["time"][:], dataset["time"].units)
# Do computations with the data, primarily relying on NumPy
data1Mean = data1.mean(axis=1)
data2Mean = data2.mean(axis=1)
data3Mean = data3.mean(axis=1)
# Write result to a file
result = pd.DataFrame( {"windspeed":data1Mean,
"pressure":data2Mean,
"temperature":data3Mean,},
index=timeindex )
result.to_csv(OUTPUTPATH%year)
if __name__ == '__main__':
pool = Pool(72)
results = []
for year in range(1900,2016):
results.append( pool.apply_async(calculationFunction, (year, )))
for r in results: r.get()
It turns out the issue was with NumPy. As this solution explains, NumPy and several other similar packages rely on the BLAS library for numerical operation. This library uses multithreading to increase performance. But as multithreading is CPU-bound, this causes many operations performed by Numpy (which in my original code don't begin until the middle, as I've indicated), to be forced onto the first CPU.
The solution is to turn off the multithreading feature of the BLAS library. I'm not sure if this impacts performance, but in this case I think it will be okay. Luckily this is easy to do, I only had to set a single environment variable which I did directly in my python code:
import os
os.environ["OPENBLAS_MAIN_FREE"] = "1"
Now the machine runs at full capacity throughout my whole code :)