I want to get the cluster link (or the cluster ID to manually compose the link) inside a running Spark job.
This will be used to print the link in an alerting message, making it easier for engineers to reach the logs.
Is it possible to achieve that in a Spark job running in Databricks?
When Databricks cluster starts, there is a number of Spark configuration properties added. Most of them are having name starting with spark.databricks.
- you can find all of the in the Environment
tab of the Spark UI.
Cluster ID is available as spark.databricks.clusterUsageTags.clusterId
property and you can get it as:
spark.conf.get("spark.databricks.clusterUsageTags.clusterId")
You can get workspace host name via dbutils.notebook.getContext().apiUrl.get
call (for Scala), or dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().get()
(for Python)