azure databricks azure-databricks databricks-workflows databricks-rest-api

How to create Azure Databricks job of type python wheel by using Azure databricks API

I would like to create a databricks job of type "python wheel" in Azure by using databricks API. I have a python wheel that I need to execute in this job.

This question is related to my other question at this stackoverflow link, just the technology used to implement this has changed.

Following the Azure databricks API documentation I know how to create a databricks job that can execute a notebook. However, what I need is a databricks job of type "python wheel". All my code is implemented in a python wheel and it needs to run 24/7. According to the requirements that I got from the development team, they need to have a job of type "python wheel" and not "notebook".

As you see databricks documentation already shows how a job of type python wheel can be created from the databricks workspace. I, however, need to automate this process in a DevOps pipeline, that's why I would like to do it by making API call to databricks API. Below is the code I have implemented to created a databricks job. This code is using a notebook to execute the code. As I mentioned I need to run a "python wheel" job just as it is explained here. Below you can see this type of job in the workspace:

My current code is as below: My objective is to change it to run a python wheel instead of a notebook:

import requests
import os


# both 2.0 and 2.1 API can create job.
dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"

DBRKS_REQ_HEADERS = {
    'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
    'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
    'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']}

CLUSTER_ID = "\"" + os.environ["DBRKS_CLUSTER_ID"] + "\""
NOTEBOOK_LOCATION = "\"" + os.environ["NOTEBOOK_LOCATION"] + "test-notebook" + "\""
print("Notebook path is {}".format(NOTEBOOK_LOCATION))
print(CLUSTER_ID)

body_json = """
    {
    "name": "A sample job to trigger from DevOps",
    "tasks": [
        {
        "task_key": "ExecuteNotebook",
        "description": "Execute uploaded notebook including tests",
        "depends_on": [],
        "existing_cluster_id": """ + CLUSTER_ID + """,
        "notebook_task": {
          "notebook_path": """ + NOTEBOOK_LOCATION + """,
          "base_parameters": {}
        },
        "timeout_seconds": 300,
        "max_retries": 1,
        "min_retry_interval_millis": 5000,
        "retry_on_timeout": false
      }
],
    "email_notifications": {},
    "name": "Run_Unit_Tests",
    "max_concurrent_runs": 1}
"""

print("Request body in json format:")
print(body_json)

response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json) 

if response.status_code == 200:
    print("Job created successfully!")
    print(response.status_code)
    print(response.content)
    print("Job Id = {}".format(response.json()['job_id']))
    print("##vso[task.setvariable variable=DBRKS_JOB_ID;isOutput=true;]{b}".format(b=response.json()['job_id'])) 
else:
    print("job failed!")
    raise Exception(response.content)

Solution

it's simple - instead of notebook_task you just need to use python_wheel_task as it's described in the REST API docs. And you need to provide package_name and entry_point parameters inside the JSON object.

And don't forget to add the wheel file in the libraries block.