githubazure-devopsazure-pipelinesazure-databricks

How to register Azure Pipelines Github App token in Azure-Databricks?


Github Apps make it possible to use a non-personal service connection to provide github integration in e.g.Azure Devops, Jenkins... etc and host of other tools.

I am asking specifically for Azure Pipelines GitHub App . It is very easy to integrate into Azure DevOps Pipelines it - any github repo / organization it is installed into because accessible via a service connection in Azure DevOps (most commonly Pipelines).

My problem is that our CI/CD pipeline in Azure Pipelines also needs to call databricks API (via databricks CLI ) to trigger new code deployment into an Azure Databricks Git-enabled folder.

Databricks CLI is unable to pick up GITHUB_TOKEN or use extraHeader: Authorization: basic xxxtokenxxx which both will work in the azure pipelines scripts as long as 'checkout' taks is configured with 'persistCredentials: true' in the pipeline's yaml (the token can then be picked up from the .git/config of the checked out repo)

I have to programmatically register new git credentials in Databricks for our Azure / Databricks service principal by calling '/api/2.0/git-credentials'

The API call requires git_username as a parameter (with other two parameters being personal_access_token which I kind of have... and git_provider ). And this is where I am stuck. I don't know if / what username is internally used by "Azure Pipelines Github App" to authenticate against Github. I only managed to get its token. Should it be registered using PAT or by OAuth (gitHubOAuth)? But OAuth also requires a git username...

While I could work around the issue by simply uploading the already checked out repo into the git-enabled folder in Azure Pipeline, the solution I am aiming at is to use ONE service principal to do all: azure, databricks and github calls. This will result in the least maintenance efforts, too.

Does anyone have insights into Databricks / Github integration using Azure Pipelines Github App and how to register it in Databricks?


Solution

  • I found a post that explains the solution.

    I noticed that after decoding the token from Base64 the token string itself (ghs_...) is actually preceded by x-access-token in case of GitHub and Azure Pipelines GitHub App.

    Once plugged into the Databricks API call for Git credentials registration, databricks repos update SOME_GIT_FOLDER --branch main no longer complains and works exactly as expected.

    This solves the whole problem of one service principal being able to handle all MS Azure, Databricks and GitHub calls. The last piece is in place.

    P.S., the tokens created by Azure Pipelines GitHub App are relatively short-lived so each pipeline calling GitHub indirectly from / through Databricks needs to re-register new credentials.