I am using python 3.6 to make API calls to Azure Databricks to create a job to run a specific notebook. I have followed the instruction of using the API at this link. The only difference is I am using python rather than curl. The code I have written is as follows:
import requests
import os
import json
dbrks_create_job_url = "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net//2.0/jobs/create"
DBRKS_REQ_HEADERS = {
'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']}
body_json = """
{
"name": "A sample job to trigger from DevOps",
"tasks": [
{
"task_key": "ExecuteNotebook",
"description": "Execute uploaded notebook including tests",
"depends_on": [],
"existing_cluster_id": """ + os.environ["DBRKS_CLUSTER_ID"] + """,
"notebook_task": {
"notebook_path": "/Users/myuser/sample-notebook",
"base_parameters": {}
},
"timeout_seconds": 300,
"max_retries": 1,
"min_retry_interval_millis": 5000,
"retry_on_timeout": false
}
],
"email_notifications": {},
"name": "my_test_job",
"max_concurrent_runs": 1}
"""
print("Request body in json format:")
print(body_json)
response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json)
if response.status_code == 200:
print("Job created successfully!")
print(response.status_code)
print(response.content)
else:
print("job failed!")
raise Exception(response.content)
All the OS environment variables are sent from my Azure DevOps pipeline. However, you don't need to execute the script from a pipeline. You can execute it from your local machine as long as you have a service principal with access to a databricks workspace. To run the python script, you can replace those environment variables with your own credentials.
Explaining the variables in the script:
os.environ['DBRKS_INSTANCE']: Name of the databricks instance
os.environ['DBRKS_BEARER_TOKEN']: the bearer token. You need this to authenticate your service principal or your user to databricks. Later I have explained how you can get it.
os.environ['DBRKS_MANAGEMENT_TOKEN']: If the service principle you are using is not added as databricks workspace users or admins, you need this token. Later I have explained how you can get it.
os.environ['DBRKS_SUBSCRIPTION_ID']: The Azure subscription Id where databricks workspace is.
os.environ['DBRKS_RESOURCE_GROUP']: Name of the Azure resource group of the databricks workspace.
os.environ['DBRKS_WORKSPACE_NAME']: Name of the Azure databricks workspace.
os.environ["DBRKS_CLUSTER_ID"]: The cluster Id which will execute the job in databricks.
When I run my script, I get the status code 200 which mean it should have worked properly as shown below:
However, when I look into list of jobs, no new job is created despite the 200 status code received! You can see below the job I have created is not there.
I also changed the API endpoint from azuredatabricks.net//2.0/jobs/create to azuredatabricks.net//2.1/jobs/create, still I get successful run but no job is being created! I can't understand what I am doing wrong. And if I am doing something wrong, how come it doesn't raise exception and gives me 200 status code.
One final point to be able to regenerate the problem I am facing: To get the above two variables for DBRKS_BEARER_TOKEN and DBRKS_MANAGEMENT_TOKEN, you can run the following script and manually replace os.environ['DBRKS_BEARER_TOKEN'] and os.environ['DBRKS_MANAGEMENT_TOKEN'] with the printed values after script execution:
import requests
import json
import os
TOKEN_BASE_URL = 'https://login.microsoftonline.com/' + os.environ['SVCDirectoryID'] + '/oauth2/token'
TOKEN_REQ_HEADERS = {'Content-Type': 'application/x-www-form-urlencoded'}
TOKEN_REQ_BODY = {
'grant_type': 'client_credentials',
'client_id': os.environ['SVCApplicationID'],
'client_secret': os.environ['SVCSecretKey']}
def dbrks_management_token():
TOKEN_REQ_BODY['resource'] = 'https://management.core.windows.net/'
response = requests.get(TOKEN_BASE_URL, headers=TOKEN_REQ_HEADERS, data=TOKEN_REQ_BODY)
if response.status_code == 200:
print(response.status_code)
else:
raise Exception(response.text)
return response.json()['access_token']
def dbrks_bearer_token():
TOKEN_REQ_BODY['resource'] = '2ff814a6-3304-4ab8-85cb-cd0e6f879c1d'
response = requests.get(TOKEN_BASE_URL, headers=TOKEN_REQ_HEADERS, data=TOKEN_REQ_BODY)
if response.status_code == 200:
print(response.status_code)
else:
raise Exception(response.text)
return response.json()['access_token']
DBRKS_BEARER_TOKEN = dbrks_bearer_token()
DBRKS_MANAGEMENT_TOKEN = dbrks_management_token()
os.environ['DBRKS_BEARER_TOKEN'] = DBRKS_BEARER_TOKEN
os.environ['DBRKS_MANAGEMENT_TOKEN'] = DBRKS_MANAGEMENT_TOKEN
print("DBRKS_BEARER_TOKEN",os.environ['DBRKS_BEARER_TOKEN'])
print("DBRKS_MANAGEMENT_TOKEN",os.environ['DBRKS_MANAGEMENT_TOKEN'])
Thank you for your valuable input.
You're mixing up the API versions - the tasks
array could be used only with Jobs API 2.1, but you're using Jobs API 2.0. Another error is that you have //
between host name & path.
Just change dbrks_create_job_url
to "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"