apache-sparkhadoop-yarngoogle-cloud-dataprocdataproc

Yarn allocates only 1 core per container. Running spark on yarn


Please ensure dynamic allocation is not killing your containers while you monitor the YARN UI. See the answer below

Issue: I can start the SparkSession with any number of cores per executor and the yarn will still report an allocation of only one core per container. I have tried all available online solutions given : here, here etc

The solution is:

  1. configure yarn-site.xml to use capacity scheduling
<property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
  1. configure capacity scheduler (capacity-scheduler.xml) to use dominant resource scheduling
<property>
    <name>yarn.scheduler.capacity.resource-calculator</name>      
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>       
</property>

However, the yarn gui still shows that the cluster is allocating only one core per executor.

I have read the answer here where it says that this is a known bug in the capacity scheduler and the solution is actually to configure yarn to use fair scheduling but, the fair scheduling would be unnecessarily complicated and that that gets displayed on the yarn gui is merely an issue of reporting and, the executors actually do have right number of cores allocated. But, that answer is 5 years old and I would assume that such a bug would have been resolved in the meanwhile.

So, I am asking this question to see if the bug still persists, if my understanding of the issue is wrong or, if I am doing something wrong and the issue can be resolved now without getting into the weeds of fair scheduling


Solution

  • This is kind of embarrassing and I thought of deleting the question but, in case it helps someone.

    The Dataproc capacity scheduler issue has been resolved both for the dominant resource calculator and for the default resource calculator

    I was seeing only one container with one core in it because I had dynamicAllocation mistyped as dynamicAllocatoin while disabling it and the dynamic allocation, being there, was killing the containers when I was not using them and the yarn UI was indeed reporting the numbers right