apache-sparkkubernetesamazon-eks

difference between spark.kubernetes.driver.request.cores, spark.kubernetes.driver.limit.cores and spark.driver.cores


I am new to Kubernetes but not to Apache Spark. I am currently working on EMR on EKS which is essentially spark on kubernetes and I cant get my head around the difference between spark.kubernetes.driver.request.cores,spark.kubernetes.driver.limit.cores and spark.driver.cores

My understanding is that spark.kubernetes.driver.request.cores is the core which the pod running my driver will be allocated when the driver pod comes up and spark.kubernetes.driver.limit.cores is the maximum limit it can go to if vertical autoscaling is enabled. I also thought that the cores available to my driver for processing equals the same that is allocated to the pod but not sure if this is the case. From my jobs logs, all of the three properties are being populated so I am a bit confused if setting spark.kubernetes.driver.request.cores and spark.kubernetes.driver.limit.cores to higher value will help my spark job at all or the spark driver will continue to use the value specified in spark.driver.cores and won't benefit from vertical autoscaling


Solution

  • Your understanding of spark.kubernetes.driver.request.cores and spark.kubernetes.driver.limit.cores is correct.

    To answer your question, you can see in the docs that spark.kubernetes.driver.request.cores takes precedence over spark.driver.cores if set.