pythonmultiprocessingjoblib

What does the delayed() function do (when used with joblib in Python)


I've read through the documentation, but I don't understand what is meant by: The delayed function is a simple trick to be able to create a tuple (function, args, kwargs) with a function-call syntax.

I'm using it to iterate over the list I want to operate on (allImages) as follows:

def joblib_loop():
    Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages)

This returns my HOG features, like I want (and with the speed gain using all my 8 cores), but I'm just not sure what it is actually doing.

My Python knowledge is alright at best, and it's very possible that I'm missing something basic. Any pointers in the right direction would be most appreciated


Solution

  • Perhaps things become clearer if we look at what would happen if instead we simply wrote

    Parallel(n_jobs=8)(getHog(i) for i in allImages)
    

    which, in this context, could be expressed more naturally as:

    1. Create a Parallel instance with n_jobs=8
    2. create a generator for the list [getHog(i) for i in allImages]
    3. pass that generator to the Parallel instance

    What's the problem? By the time the list gets passed to the Parallel object, all getHog(i) calls have already returned - so there's nothing left to execute in Parallel! All the work was already done in the main thread, sequentially.

    What we actually want is to tell Python what functions we want to call with what arguments, without actually calling them - in other words, we want to delay the execution.

    This is what delayed conveniently allows us to do, with clear syntax. If we want to tell Python that we'd like to call foo(2, g=3) sometime later, we can simply write delayed(foo)(2, g=3). Returned is the tuple (foo, [2], {g: 3}), containing:

    So, by writing Parallel(n_jobs=8)(delayed(getHog)(i) for i in allImages), instead of the above sequence, now the following happens:

    1. A Parallel instance with n_jobs=8 gets created

    2. The list

       [delayed(getHog)(i) for i in allImages]
      

      gets created, evaluating to

       [(getHog, [img1], {}), (getHog, [img2], {}), ... ]
      
    3. That list is passed to the Parallel instance

    4. The Parallel instance creates 8 threads and distributes the tuples from the list to them

    5. Finally, each of those threads starts executing the tuples, i.e., they call the first element with the second and the third elements unpacked as arguments tup[0](*tup[1], **tup[2]), turning the tuple back into the call we actually intended to do, getHog(img2).