pythonparallel-processingsmp

Is there a simple process-based parallel map for python?


I'm looking for a simple process-based parallel map for python, that is, a function

parmap(function,[data])

that would run function on each element of [data] on a different process (well, on a different core, but AFAIK, the only way to run stuff on different cores in python is to start multiple interpreters), and return a list of results.

Does something like this exist? I would like something simple, so a simple module would be nice. Of course, if no such thing exists, I will settle for a big library :-/


Solution

  • I seems like what you need is the map method in multiprocessing.Pool():

    map(func, iterable[, chunksize])

    A parallel equivalent of the map() built-in function (it supports only
    one iterable argument though). It blocks till the result is ready.
    
    This method chops the iterable into a number of chunks which it submits to the 
    process pool as separate tasks. The (approximate) size of these chunks can be 
    specified by setting chunksize to a positive integ
    

    For example, if you wanted to map this function:

    def f(x):
        return x**2
    

    to range(10), you could do it using the built-in map() function:

    map(f, range(10))
    

    or using a multiprocessing.Pool() object's method map():

    import multiprocessing
    pool = multiprocessing.Pool()
    print pool.map(f, range(10))