I need to use this parameter, so how can I get the number of workers?
Like in Scala, I can call sc.getExecutorMemoryStatus
to get the available number of workers. But in PySpark, it seems there's no API exposed to get this number.
In scala, getExecutorStorageStatus
and getExecutorMemoryStatus
both return the number of executors including driver.
like below example snippet
/** Method that just returns the current active/registered executors
* excluding the driver.
* @param sc The spark context to retrieve registered executors.
* @return a list of executors each in the form of host:port.
*/
def currentActiveExecutors(sc: SparkContext): Seq[String] = {
val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
val driverHost: String = sc.getConf.get("spark.driver.host")
allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
}
But In python api it was not implemented
@DanielDarabos answer also confirms this.
The equivalent to this in python...
sc.getConf().get("spark.executor.instances")
Edit (python) :
diagram representation :
%python
sc = spark._jsc.sc()
n_workers = len([executor.host() for executor in sc.statusTracker().getExecutorInfos() ]) -1
print(n_workers)
As Danny mentioned in the comment if you want to cross verify them you can use the below statements.
%python
sc = spark._jsc.sc()
result1 = sc.getExecutorMemoryStatus().keys() # will print all the executors + driver available
result2 = len([executor.host() for executor in sc.statusTracker().getExecutorInfos() ]) -1
print(result1, end ='\n')
print(result2)
Example Result :
Set(10.172.249.9:46467)
0