pythongitdatabricksazure-databricksdatabricks-repos

Programmatic checkout of Databricks Repos branch


I have an integration test that compares the output from running the same scripts from 2 different branches (ie, master and a feature branch). Currently this test kicks off from my local machine, but I'd like to migrate it to a Databricks job, and run it entirely from the Workflows interface.

I'm able to recreate most of the existing integration test (written in Python) using notebooks and dbutils, with the exception of the feature branch checkout. I can make a call from my local machine to the Repos REST API to perform the checkout, but (from what I can tell) I can't make that same call from a job that's running on the Databricks cloud. (I run into credentials/authentication issues when I try, and my solutions are getting increasingly hacky.)

Is there a way to checkout a branch using pure Python code; something like a dbutils.repos.checkout()? Alternatively, is there a safe way to call the REST APIs from from a job that's running on the Databricks cloud?


Solution

  • You can either use Repos REST API, specifically, the Update command of it. But in case of doing CI/CD, it's easier to use databricks repos update command of Databricks CLI, like this:

    databricks repos update --path <path> --branch <branch>
    

    P.S. I have end-to-end example of doing CI/CD for Repos + Notebooks on Azure DevOps, but approach will be the same for other systems. Here is an example of using Databricks CLI for checkout.