Hell everyone, I have a problem with Apache Spark (version 3.3.1) on k8s
.
In short: When I run the statement
print(sc.uiWebUrl)
within a pod, I would get a URL
that is accessible from outside the k8s
cluster.
Something like:
http://{{my-ingress-host}}
Long story:
I want to create a workspace for Apache Spark on k8s
, where the driver's pod, is the workspace that I work on. I want to let the client run Apache Spark either with pyspark-shell
or with the pyspark
python library.
In either way, I want that the UI's web url would be a one that is accessible from the outside world (outside the k8s
cluster).
Why? Because of UX
, I want to make my client's life easier.
Because I run on k8s
, part of the configuration of my Apache Spark program is:
spark.driver.host={{driver-service}}.{{drivers-namespace}}.svc.cluster.local
spark.driver.bindAddress=0.0.0.0
Because of that, the output of this code:
print(sc.webUiUrl)
Would be:
http://{{driver-service}}.{{drivers-namespace}}.svc.cluster.local:4040
Also in the pyspark-shell, the same address would be displayed.
So my question is, is there a way to change the ui web url's host to a host that I have defined in my ingress
to make my client's life easier?
So the new output would be:
http://{{my-defined-host}}
Other points I want to make sure to adjust the solution as much as possible:
nginx
ingress in my k8s cluster. Maybe I have a HAPROXY
ingress. But I would want to be coupled to my ingress implementation as least as possiable.pyspark-shell
displays the welcome screen.ui.proxy
configurations, and it haven't helped. And sometimes made things worst.Thanks ahead for everyone, any help would be appreciated.
You can change your web UI's host to a host that you want by setting the SPARK_PUBLIC_DNS
environment variable. This needs to be done on the driver, since the web UI runs on the driver.
To set the port for the web UI, you can do that using the spark.ui.port
config parameter.
So putting both together using spark-submit
for example, makes something like the following:
bin/spark-submit \
--class ... \
--master k8s://... \
....
....
....
--conf spark.kubernetes.driverEnv.SPARK_PUBLIC_DNS=YOUR_VALUE_HERE
--conf spark.ui.port=YOUR_WANTED_PORT_HERE
...