pysparkdatabricksazure-databricks

Share cluster params between jobs


I have a workflow WF1 in which I trigger another workflow WF2 in the task T2. Here is the thing. I know I can use os.environ.copy() output as a parameter in the task T2 and os.environ.update() inside WF2. But imagine any task of WF2 fails. When I repai & run the WF2, it will not have WF1 environ parameters. So my question, is there any other way to copy env parameters from WF1 to WF2 that supports repair runs & tasks without losing all the variables?


Solution

  • No, notebook or job parameters whichever you use while doing repair and run the parameter value still be available.

    Here, is the screenshot in my run, consists of value while doing repair and run.

    enter image description here

    You read that value and update the environment like below in WF2

    dbutils.widgets.text("wfParam", "default")
    
    import json
    import os
    
    wfParam_value = json.loads(dbutils.widgets.get("wfParam"))
    wfParam_value
    os.environ.update(wfParam_value)
    

    Here, wfParam is the parameter you configured in your workflow pipeline 2.

    Also, while triggering the WF2, pass the environment variables in Json string.

    payload = {
        "job_id": job_id,
        "job_parameters": {
            "wfParam": json.dumps(envs)
        }
    }
    response = requests.post(url, headers=headers, json=payload)
    display(response.json())