pythonmultiprocessingminimummemory-efficient

Multiprocessing Pool: return the minimum element


I want to run a task with a multiprocessing.Pool and return only the minimum element, without taking the memory to store every output.

My code so far:

with Pool() as pool:
  programs = pool.map(task, groups)
  shortest, l = min(programs, key = lambda a: len(a[0]))

This works, however this would occupy a lot of memory for the result of pool.map(). groups is a set, which can be really big, and the results would take up very much memory.

I would like some kind of approach like this:

with Pool() as pool:
  shortest, l = pool.execute_and_return_min(task, groups, key = lambda a: len(a[0]))

(which would internally compare the results and return the smallest element)

or:

with Pool() as pool:
  shortest = l = None
  for program, k in pool.apply_and_return_as_generator(task, groups):
    if shortest is None or len(program) < len(shortest):
      shortest = program
      l = k

(which would work like the normal pool but return values from the generator as soon as they are computed)

I couldn't find any method of the pool to achieve something like this. Since I only want the minimum element, I do not care about the order of execution. Maybe I was not careful enough when searching.

Any help would be appreciated. Preferred is a solution with Pool(), but if you have an idea how to implement this using pther technique - please go ahead as well.

Thanks in advance!


Solution

  • After reading the comments, I have found an optimal solution for me.

    As @Robin De Schepper pointed out, there is the imap method, which is a lazy version of the map method. Even better for me was the imap_unordered, which I went with at the end.

    The upside of a lazy generator is that it doesn't have to process all items for the results to be available. Further, the unordered variant is even better, because it returns values as soon as they are ready. Since I do not need to preserve the order, this was the optimal method to use.

    Final solution:

    with Pool() as pool:
      shortest = l = None
      for program, k in pool.imap_unordered(task, groups):
        if shortest is None or len(program) < len(shortest):
          shortest = program
          l = k