I have trained a CNN model on GPU using FastAI (PyTorch backend). I am now trying to use that model for inference on the same machine, but using CPU instead of GPU. Along with that, I am also trying to make use of multiple CPU cores using the multiprocessing module. Now here is the issue,
Running the code on single CPU (without multiprocessing) takes only 40 seconds to process nearly 50 images
Running the code on multiple CPUs using torch multiprocessing takes more than 6 minutes to process the same 50 images
from torch.multiprocessing import Pool, set_start_method
os.environ['CUDA_VISIBLE_DEVICES']=""
from fastai.vision import *
from fastai.text import *
defaults.device = torch.device('cpu')
def process_image_batch(batch):
learn_cnn = load_learner(scripts_folder, 'cnn_model.pkl')
learn_cnn.model.training = False
learn_cnn.model = learn_cnn.model.eval()
# for image in batch:
# prediction = ... # predicting the image here
# return prediction
if __name__ == '__main__':
#
# image_batches = ..... # retrieving the image batches (It is a list of 5 lists)
# n_processes = 5
set_start_method('spawn', force=True)
try:
pool = Pool(n_processes)
pool.map(process_image_batch, image_batches)
except Exception as e:
print('Main Pool Error: ', e)
except KeyboardInterrupt:
exit()
finally:
pool.terminate()
pool.join()
I am not sure what's causing this slowdown in multiprocessing mode. I've read a lot of posts discussing similar issue but couldn't find a proper solution anywhere.
The solution turned out to be forcing pytorch to use only 1 thread per process as below
torch.set_num_threads(1)