amazon-web-servicesdatabricksazure-databricksaws-databricksdatabricks-community-edition

Execute multiple notebooks in parallel in pyspark databricks


Question is simple:

master_dim.py calls dim_1.py and dim_2.py to execute in parallel. Is this possible in databricks pyspark?

Below image is explaning what am trying to do, it errors for some reason, am i missing something here?

enter image description here


Solution

  • Just for others in case they are after how it worked:

    from multiprocessing.pool import ThreadPool
    pool = ThreadPool(5)
    notebooks = ['dim_1', 'dim_2']
    pool.map(lambda path: dbutils.notebook.run("/Test/Threading/"+path, timeout_seconds= 60, arguments={"input-data": path}),notebooks)