I have an integration test that compares the output from running the same scripts from 2 different branches (ie, master and a feature branch). Currently this test kicks off from my local machine, but I'd like to migrate it to a Databricks job, and run it entirely from the Workflows interface.
I'm able to recreate most of the existing integration test (written in Python) using notebooks and dbutils
, with the exception of the feature branch checkout
. I can make a call from my local machine to the Repos REST API to perform the checkout, but (from what I can tell) I can't make that same call from a job that's running on the Databricks cloud. (I run into credentials/authentication issues when I try, and my solutions are getting increasingly hacky.)
Is there a way to checkout a branch using pure Python code; something like a dbutils.repos.checkout()
? Alternatively, is there a safe way to call the REST APIs from from a job that's running on the Databricks cloud?
You can either use Repos REST API, specifically, the Update command of it. But in case of doing CI/CD, it's easier to use databricks repos update
command of Databricks CLI, like this:
databricks repos update --path <path> --branch <branch>
P.S. I have end-to-end example of doing CI/CD for Repos + Notebooks on Azure DevOps, but approach will be the same for other systems. Here is an example of using Databricks CLI for checkout.