pythonscalahadoopapache-sparkexecutors

Spark - How many Executors and Cores are allocated to my spark job


Spark architecture is entirely revolves around the concept of executors and cores. I would like to see practically how many executors and cores running for my spark application running in a cluster.

I was trying to use below snippet in my application but no luck.

val conf = new SparkConf().setAppName("ExecutorTestJob")
val sc = new SparkContext(conf)
conf.get("spark.executor.instances")
conf.get("spark.executor.cores")

Is there any way to get those values using SparkContext Object or SparkConf object etc..


Solution

  • Scala (Programmatic way) :

    getExecutorStorageStatus and getExecutorMemoryStatus both return the number of executors including driver. like below example snippet.

    /** Method that just returns the current active/registered executors
            * excluding the driver.
            * @param sc The spark context to retrieve registered executors.
            * @return a list of executors each in the form of host:port.
            */
           def currentActiveExecutors(sc: SparkContext): Seq[String] = {
             val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
             val driverHost: String = sc.getConf.get("spark.driver.host")
             allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
           }
    
    sc.getConf.getInt("spark.executor.instances", 1)
    

    similarly get all properties and print like below you may get cores information as well..

    sc.getConf.getAll.mkString("\n")
    

    OR

    sc.getConf.toDebugString
    

    Mostly spark.executor.cores for executors spark.driver.cores driver should have this value.

    Python :

    Above methods getExecutorStorageStatus and getExecutorMemoryStatus, In python api were not implemented

    EDIT But can be accessed using Py4J bindings exposed from SparkSession.

    sc._jsc.sc().getExecutorMemoryStatus()