gitdatabrickscredentialsbitbucket-cloudservice-principal

Bitbucket Cloud repository using Repository Access Token for Databricks Service Principal git credentials


I am trying to get a Databricks Workflow running as a Service Principal. I am using a Bitbucket Cloud Repository Access Token so that the git credentials for the Databricks Service Principal to use are not tied to an individual user.

Based on the Databricks API to create git credentials, it's not clear how a Repository Access Token should fit given there is no defined git_username other than the <hash>@bots.bitbucket.org ID which does not work. (It works fine for Personal Access Tokens given the username is well documented).

I tried various variations of the git user.name Bitbucket gives me as part of the Repository Access Token.

I expect the correct magical combination of git_username and personal_access_token passed to the Databricks API to authenticate successfully. Unfortunately it's not documented.


Solution

  • You appear to have two issues here:

    Bitbucket Repository Access Tokens

    I am assuming that you are using Bitbucket Cloud.

    Bitbucket Repository Access Tokens are a good way to give Bitbucket access to a 'bot' account such as CI/CD or, in your case, to an automated workflow running in another system. The generated Token is associated with the Repository rather than a human, so if a person leaves your organisation the Token will continue to work.

    When a Token is generated, Bitbucket provides various forms of how to use it. Take a close look at this one:

    How to use this token with your Git repository

    To clone this repository using this token, run:

    git clone https://x-token-auth:<long-token-here>@bitbucket.org/company/repository.git

    It isn't totally clear, but it is saying that the username is x-token-auth and the password is the long token provided. This is very important!!

    Databricks Service Principal

    Setting git credentials on a Service Principal isn't easy. You'll need to use curl or Postman to generate the configuration. (Why don't they put it in the UI, I have no idea!)

    The steps are:

    From Service principals for CI/CD | Databricks on AWS:

    curl -X POST \
    ${DATABRICKS_HOST}/api/2.0/git-credentials \
    --header 'Authorization: Bearer <service-principal-access-token>' \
    --data @set-git-credentials.json \
    | jq .
    

    The important part comes in the json configuration file:

    set-git-credentials.json:
    
    {
       "personal_access_token": "<Git Provider Access Token>",
       "git_username": "x-token-auth",
       "git_provider": "bitbucketCloud"
    }
    

    Notice how git_username uses x-token-auth that was mentioned by Bitbucket earlier? This is the most important part. Also, make sure the git_provider is set to bitbucketCloud.

    As a side note, if you get it wrong, the process for deleting the git credentials is quite painful. You need to obtain the ID that was associated with the git credentials. It is displayed after using the above git-credentials command, which returns with something like:

    {
      "credential_id": 749722601042,
      "git_provider": "bitbucketCloud",
      "git_username": "x-token-auth"
    }
    

    You can then use that credential_id with the Delete a credential | REST API reference | Databricks on AWS API call to delete the existing credentials and then try to set them again. If you don't remember the credential_id you can use Get Git credentials | REST API reference | Databricks on AWS to retrieve it.

    Bottom line: Make sure to use x-token-auth as the username.