concurrencyparallel-processingsoftware-estimation

How to calculate time estimate for parallel tasks?


I need to calculate the total amount of time for a certain number of tasks to be completed. Details:

How can I calculate the total time it will take to process all tasks, given the concurrency? I know it will take at least as long as the longest task (25 seconds), but is there a formula/method to calculate a rough total estimate, that will scale with more tasks added?


Solution

  • If you don't mind making some approximations it could be quite simple. If the tasks take roughly the same time to complete, you could use the average of the tasks duration as a basis (here, 20 seconds).

    Assuming that the system is always full of tasks, that task duration is small enough, that there are many tasks and that concurrency level is high enough, then:

    estimated_duration = average_task_duration * nb_tasks / nb_workers

    Where nb_workers is the number of concurrent threads.

    Here is some Python code that shows the idea:

    from random import random
    from time import sleep, monotonic
    from concurrent.futures import ThreadPoolExecutor
    
    def task(i: int, duration: float):
      sleep(duration)
    
    def main():
      nb_tasks = 20
      nb_workers = 3
      average_task_duration = 2.0
      expected_duration = nb_tasks * average_task_duration / nb_workers
    
      durations = [average_task_duration + (random() - 0.5) for _ in range(nb_tasks)]
    
      print(f"Starting work... Expected duration: {expected_duration:.2f} s")
      start = monotonic()
      with ThreadPoolExecutor(max_workers=nb_workers) as executor:
        for i, d in enumerate(durations):
          executor.submit(task, i, d)
      stop = monotonic()
    
      print(f"Elapsed: {(stop - start):.2f} s")
    
    
    if __name__ == "__main__":
      main()
    

    If these hypotheses cannot hold in your case, then you'd better use a Bin Packing algorithm as Jerôme suggested.