I have created a pipeline in Azure DevOps to perform the following three steps:
Retrieve the job definition from one Databricks workspace and save it as a json (Databricks CLI config is omitted)
databricks jobs get --job-id $(job_id) > workflow.json
Use this json to update the workflow in a second (separate) Databricks workspace (Databricks CLI is first reconfigured to point to the new workspace)
databricks jobs reset --job-id $(job_id) --json-file workflow.json
Run the updated job in the second Databricks workspace
databricks jobs run-now --job-id $(job_id)
However, my pipeline fails at step 2 with the following error, even though the existing_cluster_id
is already defined inside the workflow.json
. Any idea?
Error: b'{"error_code":"INVALID_PARAMETER_VALUE","message":"One of job_cluster_key, new_cluster, or existing_cluster_id must be specified."}'
Here is what my workflow.json
looks like (hiding some of the details):
{
"job_id": 123,
"creator_user_name": "user1",
"run_as_user_name": "user1",
"run_as_owner": true,
"settings":
{
"name": "my-workflow",
"existing_cluster_id": "abc-def-123-xyz",
"email_notifications": {
"no_alert_for_skipped_runs": false
},
"webhook_notifications": {},
"timeout_seconds": 0,
"notebook_task": {
"notebook_path": "notebooks/my-notebook",
"base_parameters": {
"environment": "production"
},
"source": "GIT"
},
"max_concurrent_runs": 1,
"git_source": {
"git_url": "https://my-org@dev.azure.com/my-project/_git/my-repo",
"git_provider": "azureDevOpsServices",
"git_branch": "master"
},
"format": "SINGLE_TASK"
},
"created_time": 1676477563075
}
I figured out that you don't need to retrieve the entire workflow definition json file, as shown in step 1, but only the "settings" part, i.e. modifying step 1 to this solved my issue:
databricks jobs get --job-id $(job_id) | jq .settings > workflow.json