databricksdatabricks-clidatabricks-rest-apidatabricks-repos

Execute git pull on databricks notebook using CLI and/or API


Using Databricks Repos, you can add a git repo to Databricks and execute git actions such as git pull. This is done by clicking on the branch name in the top left, and clicking the button saying "Pull".

I would like to do this without clicking on things in my browser.

I would assume that both are possible (this answer implies so), but providing just one would be sufficient to answer my question.


One might wonder what I expect to happen if a pull is non-trivial, eg. the branches have diverged or "your unstaged changes would be wiped out by pulling...". Simply erroring out would be sufficient in this case. I intend to ensure that it will never happen through other mechanisms.


Solution

  • For databricks-cli it's the databricks repos update command:

    >databricks repos update -h     
    Usage: databricks repos update [OPTIONS]
    
      Checks out the repo to the given branch or tag. This call returns an error
      if the branch  or tag doesn't exist.
    
    Options:
      --repo-id TEXT  Repo ID
      --path TEXT     Workspace path of the repo object
      --branch TEXT   Branch name
      --tag TEXT      Tag name
    

    it will checkout branch even if repo is on the given branch:

    databricks repos update --path /Repos/.... --branch releases
    

    You can find the working demo of it in the following repository that shows integration of Repos with Azure DevOps.

    For REST API, there is the corresponding endpoint. The only difference from CLI is that it accepts only Repository ID, not the path, but you can find Repos ID from path via Get Status endpoint of Workspace API. You can find an example in the history of the same demo repository (please note that Repos API could change since that time)