I am trying to get a Databricks Workflow running as a Service Principal. I am using a Bitbucket Cloud Repository Access Token so that the git credentials for the Databricks Service Principal to use are not tied to an individual user.
Based on the Databricks API to create git credentials, it's not clear how a Repository Access Token should fit given there is no defined git_username
other than the <hash>@bots.bitbucket.org
ID which does not work. (It works fine for Personal Access Tokens given the username is well documented).
I tried various variations of the git user.name Bitbucket gives me as part of the Repository Access Token.
I expect the correct magical combination of git_username
and personal_access_token
passed to the Databricks API to authenticate successfully. Unfortunately it's not documented.
You appear to have two issues here:
I am assuming that you are using Bitbucket Cloud.
Bitbucket Repository Access Tokens are a good way to give Bitbucket access to a 'bot' account such as CI/CD or, in your case, to an automated workflow running in another system. The generated Token is associated with the Repository rather than a human, so if a person leaves your organisation the Token will continue to work.
When a Token is generated, Bitbucket provides various forms of how to use it. Take a close look at this one:
How to use this token with your Git repository
To clone this repository using this token, run:
git clone https://x-token-auth:<long-token-here>@bitbucket.org/company/repository.git
It isn't totally clear, but it is saying that the username is x-token-auth
and the password is the long token provided. This is very important!!
Setting git credentials on a Service Principal isn't easy. You'll need to use curl
or Postman to generate the configuration. (Why don't they put it in the UI, I have no idea!)
The steps are:
on-behalf-of
call (instructions an be found on Service principals for Databricks automation | Databricks on AWS)From Service principals for CI/CD | Databricks on AWS:
curl -X POST \
${DATABRICKS_HOST}/api/2.0/git-credentials \
--header 'Authorization: Bearer <service-principal-access-token>' \
--data @set-git-credentials.json \
| jq .
The important part comes in the json
configuration file:
set-git-credentials.json:
{
"personal_access_token": "<Git Provider Access Token>",
"git_username": "x-token-auth",
"git_provider": "bitbucketCloud"
}
Notice how git_username
uses x-token-auth
that was mentioned by Bitbucket earlier? This is the most important part. Also, make sure the git_provider
is set to bitbucketCloud
.
As a side note, if you get it wrong, the process for deleting the git credentials is quite painful. You need to obtain the ID that was associated with the git credentials. It is displayed after using the above git-credentials
command, which returns with something like:
{
"credential_id": 749722601042,
"git_provider": "bitbucketCloud",
"git_username": "x-token-auth"
}
You can then use that credential_id
with the Delete a credential | REST API reference | Databricks on AWS API call to delete the existing credentials and then try to set them again. If you don't remember the credential_id
you can use Get Git credentials | REST API reference | Databricks on AWS to retrieve it.
Bottom line: Make sure to use x-token-auth
as the username.