databricksdatabricks-connect

Switch between workspaces with databricks-connect


Is it possible to switch workspace with the use of databricks-connect?

I'm currently trying to switch with: spark.conf.set('spark.driver.host', cluster_config['host'])

But this gives back the following error: AnalysisException: Cannot modify the value of a Spark config: spark.driver.host


Solution

  • If you look into documentation on setting the client, then you will see that there are three methods to configure Databricks Connect:

    But if you use different DBR versions, then it's not enough to change configuration properties, you also need to switch Python environments that contains corresponding version of Databricks Connect distribution.

    For my own work I wrote following Zsh script that allows easy switch between different setups (shards) - it allows to use only one shard at time although. Prerequisites are:

    pyenv activate field-eng-shard
    pip install -U databricks-connect==<DBR-version>
    
    function use-shard() {
        SHARD_NAME="$1"
        if [ -z "$SHARD_NAME" ]; then
            echo "Usage: use-shard shard-name"
            return 1
        fi
        if [ ! -L ~/.databricks-connect ] && [ -f ~/.databricks-connect ]; then
            echo "There is ~/.databricks-connect file - possibly you configured another shard"
        elif [ -f ~/.databricks-connect-${SHARD_NAME} ]; then
            rm -f ~/.databricks-connect
            ln -s ~/.databricks-connect-${SHARD_NAME} ~/.databricks-connect
            pyenv deactivate
            pyenv activate ${SHARD_NAME}-shard
        else
            echo "There is no configuration file for shard: ~/.databricks-connect-${SHARD_NAME}"
        fi
    }