I have installed databricks-connect
on Windows in Conda environment.
There is only one command with the tool
>databricks-connect -h
usage: databricks-connect.exe [-h] {test}
positional arguments:
{test}
options:
-h, --help show this help message and exit
When I run the test
command I get:
>databricks-connect test
* Checking Python version
* Creating and validating a session with the default configuration
<Config: host=https://adb-<NUMBERS>859.19.azuredatabricks.net, token=***, auth_type=pat>
Traceback (most recent call last):
File "...\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "...\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "...\databricks-connect.exe\__main__.py", line 7, in <module>
sys.exit(main())
File "...\databricks\connect\cli.py", line 55, in main
test()
File "...\databricks\connect\cli.py", line 38, in test
spark = DatabricksSession.builder.validateSession(True).getOrCreate()
File "...\databricks\connect\session.py", line 390, in getOrCreate
return self._from_sdkconfig(Config(), self._gen_user_agent(),
File "...\databricks\connect\cache.py", line 53, in wrapper
cache[cache_id] = func(*args, **kwargs)
File "...\databricks\connect\session.py", line 436, in _from_sdkconfig
raise Exception("Cluster id is required but was not specified.")
Exception: Cluster id is required but was not specified.
Is there an issue with the tool or my configurations? This is how .databrickscfg
looks like:
[DEFAULT]
host = https://adb-<NUMBERS>859.19.azuredatabricks.net/
token = <HASH>232-2
jobs-api-version = 2.0
[test]
host = https://adb-<NUMBERS>859.19.azuredatabricks.net/
token = <HASH>232-2
jobs-api-version = 2.0
[acc]
host = https://adb-<NUMBERS>667.7.azuredatabricks.net/
token = <HASH>268-2
jobs-api-version = 2.0
[prod]
host = https://adb-<NUMBERS>558.18.azuredatabricks.net/
token = <HASH>36d-2
jobs-api-version = 2.0
and my .databricks-connect
looks like:
[DEFAULT]
host = https://adb-<NUMBERS>859.19.azuredatabricks.net/
token = <NUMBERS>417-2
cluster_id = 0208-<NUMBERS>-th3jhcdp
org_id = <NUMBERS>66859
port = 15001
One solution could be that your .databrickscfg
should contain the cluster-id per profile, e.g.:
[DEFAULT]
host = https://adb-<NUMBERS>.<NUMBER>.azuredatabricks.net
cluster_id = <NUMBERS>-<NUMBERS>-<NUMBERSANDLETTERS>
token = <TOKENHASH>
You can add it manually or with running the command:
databricks configure --configure-cluster --profile DEFAULT
as per the documentation on Azure Databricks
Notes:
I'm using PAT auth for databricks-connect
in my virtualenv.
I don't have a .databricks-connect
file, but I'm not using Conda env.