I am new to Kubernetes but not to Apache Spark. I am currently working on EMR on EKS which is essentially spark on kubernetes and I cant get my head around the difference between spark.kubernetes.driver.request.cores
,spark.kubernetes.driver.limit.cores
and spark.driver.cores
My understanding is that spark.kubernetes.driver.request.cores
is the core which the pod running my driver will be allocated when the driver pod comes up and spark.kubernetes.driver.limit.cores
is the maximum limit it can go to if vertical autoscaling is enabled. I also thought that the cores available to my driver for processing equals the same that is allocated to the pod but not sure if this is the case. From my jobs logs, all of the three properties are being populated so I am a bit confused if setting spark.kubernetes.driver.request.cores
and spark.kubernetes.driver.limit.cores
to higher value will help my spark job at all or the spark driver will continue to use the value specified in spark.driver.cores
and won't benefit from vertical autoscaling
Your understanding of spark.kubernetes.driver.request.cores
and spark.kubernetes.driver.limit.cores
is correct.
To answer your question, you can see in the docs that spark.kubernetes.driver.request.cores
takes precedence over spark.driver.cores
if set.