I often see that spark spins more executors than it should on a given node. This is spiking the load average on the node which eventually results in executor lost
or bad node
or unhealthy node
My spark configuration is as follows
yarn. node manager.resource.cpu-vcores
on master, core, and task nodes is set to 6. Meaning at most 6 executors should run on a node.Questions:
My startup configuration JSON file to AWS-EMR is as follows. This was given to the node configurations for master, task, and core nodes (same JSON to all).
I added high values to heartbeats etc. only to make sure executors are not lost. A job runs for 1-1.5 hrs; if only the executor can hang on for a while the job is completed, or else all 1.5 hrs effort is wasted.
[
{
"Classification": "yarn-site",
"Properties": {
"yarn.nodemanager.resource.cpu-vcores": "6",
"yarn.nodemanager.resource.memory-mb": "30000",
"yarn.resourcemanager.nodemanagers.heartbeat-interval-max-ms":"60000",
"yarn.resourcemanager.nodemanagers.heartbeat-interval-min-ms":"60000",
"yarn.resourcemanager.nodemanagers.heartbeat-interval-ms":"60000",
"yarn.nodemanager.health-checker.timeout-ms":"72000000",
"yarn.nodemanager.health-checker.interval-ms":"36000000",
"yarn.resourcemanager.application-timeouts.monitor.interval-ms":"180000"
}
},
{
"Classification": "mapred-site",
"Properties": {
"yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms":"60000",
"yarn.app.mapreduce.am.hard-kill-timeout-ms":"600000"
}
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.executor.heartbeatInterval": "600s",
"spark.network.timeout":"7200s"
}
}
]
I was finally able to find the fix. Apparently capacity-scheduler
of yarn has bug which does over allocation of executors when there is adequate memory for new executor. Now, this can be fixed by setting the flag
"yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
in /capacity-scheduler.conf
In my case because I am using EMR the below configuration helped.
[
{
"Classification": "yarn-site",
"Properties": {
"yarn.nodemanager.resource.cpu-vcores": "5"
}
},
{
"Classification": "capacity-scheduler",
"Properties": {
"yarn.scheduler.capacity.resource-calculator": "org.apache.hadoop.yarn.util.resource.DominantResourceCalculator"
}
}
]
With the above JSON, I verified and now Yarn doesn't allocate more than configured cores on the node to the executors.
steps: ( new emr console > select cluster > configurations > instance group configurations > select master radio > reconfigure > and then give the above json > save )