multithreadinggroovygpars

How can I guarantee order of execution/presentation of tasks in GPars threadpool?


I'm running a set of tasks using a Gpars thread pool. The task execution times are very variable, from a few seconds to 20 minutes. (These are cucumber feature files FWIW.)

As luck would have it, the last task in the features list takes the longest to run, so the whole process sits there executing runtest('australian_government_rebate.feature') for 25 minutes when all the other threads have completed.

This means that multi-threading is not living up to its promise. The single-threaded tests take 65 minutes to run, the multi-threaded ones 48 mins. I was hoping for 30 mins or better.

My solution is to sort the feature files by previous execution time:

features = ...
features.sort { a, b -> b.executionTime() <=> a.executionTime() }
GParsPool.withPool(noOfCores) {
    features.eachParallel { feature ->
        runtest(feature)
    }
}

My question is this: can I guarantee that the features will be presented to the GParsPool in the order they occur in features?


Solution

  • For cases like this one I'd recommend using dataflow tasks started from within a sequential for loop on a sorted collection of "features", instead of parallel collections:

    PGroup group = ...
    for(f in features) group.task {runtest(it)}
    

    This would guarantee the startup order that you intend.