gitazure-devopsdatabricksazure-databricksdatabricks-repos

Databricks x Azure Devops: Can I create a devops pipeline to update databricks repo post merge?


I currently have a databricks and azure devops integration. At the current time, we develop on the databricks workspace under the 'dev' repo/folder connected to the azure 'dev' repo. We also have a 'Prod' repo/folder and 'main' azure repo.

When the time comes, we create a merge request in azure devops to pull the changes made on dev into main, but when we do this the 'prod' repo/folder in databricks is left without the new changes made through the merge. To resolve this we have to do another pull request to the local workspace.

Is there anyway to setup a azure pipeline so that when we merge from dev to prod, the databricks repo automatically pulls the changes?


Solution

  • Yes, you can pull the changes into Repos automatically - you can do that either by using Databricks Repos Update REST API or using the Databricks CLI although right now it's better to use what is called "legacy cli" that has a bit better support for repos update.

    I have a demo project that shows how to implement multi-stage code promotion using Repos + Azure DevOps but it really comes to the following (in full pipeline):

    - script: |
        echo "Checking out the releases branch"
    
        databricks repos update --path $(STAGING_DIRECTORY) --branch "$(Build.SourceBranchName)"
      env:
        DATABRICKS_HOST: $(DATABRICKS_HOST)
        DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
      displayName: Update Staging repository