pythonmultiprocessingwith-statementprocess-pool

multiprocessing returns "too many open files" but using `with...as` fixes it. Why?


I was using this answer in order to run parallel commands with multiprocessing in Python on a Linux box.

My code did something like:

import multiprocessing
import logging

def cycle(offset):
    # Do stuff

def run():
    for nprocess in process_per_cycle:
        logger.info("Start cycle with %d processes", nprocess)
        offsets = list(range(nprocess))
        pool = multiprocessing.Pool(nprocess)
        pool.map(cycle, offsets)

But I was getting this error: OSError: [Errno 24] Too many open files
So, the code was opening too many file descriptor, i.e.: it was starting too many processes and not terminating them.

I fixed it replacing the last two lines with these lines:

    with multiprocessing.Pool(nprocess) as pool:
        pool.map(cycle, offsets)

But I do not know exactly why those lines fixed it.

What is happening underneath of that with?


Solution

  • You're creating new processes inside a loop, and then forgetting to close them once you're done with them. As a result, there comes a point where you have too many open processes. This is a bad idea.

    You could fix this by using a context manager which automatically calls pool.terminate, or manually call pool.terminate yourself. Alternatively, why don't you create a pool outside the loop just once, and then send tasks to the processes inside?

    pool = multiprocessing.Pool(nprocess) # initialise your pool
    for nprocess in process_per_cycle:
        ...       
        pool.map(cycle, offsets) # delegate work inside your loop
    
    pool.close() # shut down the pool
    

    For more information, you could peruse the multiprocessing.Pool documentation.