I am trying to install Dask in Azure Databricks, and to do so, I am following the following documentation: https://github.com/dask-contrib/dask-databricks
First I have created the init script and added to the cluster. When the cluster is on, in the Event log, I can see the message of "Finished init scripts execution.":
{
"init_scripts": {
"reported_for_node": "0000-1111111-xxxxxxxx_10_139_64_16",
"global": [],
"cluster": [
{
"workspace": {
"destination": "/Users/XXXXXXXXXX/dask-init.sh"
},
"status": "SUCCEEDED",
"execution_duration_seconds": 37
}
]
}
}
After that, I am trying to execute anything from a notebook but the session appearly is not starting. Also I am getting the following error:
Failure starting repl. Try detaching and re-attaching the notebook.
at com.databricks.spark.chauffeur.ExecContextState.processInternalMessage(ExecContextState.scala:346)
I have also tried detaching and re-attaching, but it never works. Any advice or any way to install dask in Azure Databricks?
Thanks beforehand
EDIT: RESOLVED
As @tom schimoler suggested, to resolve the issue I have followed this steps:
-On my init script I have set Numpy version to 1.24.2:
#!/bin/bash
# Install numpy 1.23
/databricks/python/bin/pip install numpy==1.24.2
# Install Dask + Dask Databricks
/databricks/python/bin/pip install --upgrade dask[complete] dask-databricks
# Start Dask cluster components
dask databricks run
-Used Runtime 15.4 LTS ML cluster
I just had this same issue, when things were working fine a few weeks ago. In my case, it turned out to be a library dependency issue.
As it happens the latest version of dask (released on 2024-08-30) is 2024.8.2, which has a min numpy version >= 1.24; I'm on runtime 15.4 LTS ML, which has numpy 1.23. So of course pip installs numpy 2.1, which breaks compatibility with every other library. So, you can try specifying numpy==1.24 in your init script along with the dask install.
Not sure this is what's happening in your case. If you can open a terminal on your cluster you should be able to verify what version of dask and numpy got installed.