python-3.xazure-devopsazure-databricksdatabricks-repos

How can I deploy arbitrary files from an Azure git repo to a Databricks workspace?


Databricks recently added support for "files in repos" which is a neat feature. It gives a lot more flexibility to the projects, since we can now add .json config files and even write custom python modules that exists solely in our closed environment.

However, I just noticed that the standard way of deploying from an Azure git repo to a workspace does not support arbitrary files. First off, all .py files are converted to notebooks, breaking the custom modules that we wrote for our project. Secondly, it intentionally skips files ending in one of the following: .scala, .py, .sql, .SQL, .r, .R, .ipynb, .html, .dbc, which means our .json config files are missing when the deployment is finished.

Is there any way to get around these issues or will we have to revert everything to use notebooks like we used to?


Solution

  • You need to stop doing deployment the old way as it depends on the Workspace REST API that doesn't support arbitrary files. Instead you need to have a Git checkout in your destination workspace, and update that checkout to a given branch/tag when doing release. This is could be done via Repos API, or databricks cli. Here is an example of how to do that with cli from DevOps pipeline.

    - script: |
        echo "Checking out the releases branch"
        databricks repos update --path $(STAGING_DIRECTORY) --branch "$(Build.SourceBranchName)"
      env:
        DATABRICKS_HOST: $(DATABRICKS_HOST)
        DATABRICKS_TOKEN: $(DATABRICKS_TOKEN)
      displayName: 'Update Staging repository'