Please ensure dynamic allocation is not killing your containers while you monitor the YARN UI. See the answer below
Issue: I can start the SparkSession with any number of cores per executor and the yarn will still report an allocation of only one core per container. I have tried all available online solutions given : here, here etc
The solution is:
yarn-site.xml
to use capacity scheduling<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
capacity-scheduler.xml
) to use dominant resource scheduling<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
However, the yarn gui still shows that the cluster is allocating only one core per executor.
I have read the answer here where it says that this is a known bug in the capacity scheduler and the solution is actually to configure yarn to use fair scheduling but, the fair scheduling would be unnecessarily complicated and that that gets displayed on the yarn gui is merely an issue of reporting and, the executors actually do have right number of cores allocated. But, that answer is 5 years old and I would assume that such a bug would have been resolved in the meanwhile.
So, I am asking this question to see if the bug still persists, if my understanding of the issue is wrong or, if I am doing something wrong and the issue can be resolved now without getting into the weeds of fair scheduling
This is kind of embarrassing and I thought of deleting the question but, in case it helps someone.
The Dataproc capacity scheduler issue has been resolved both for the dominant resource calculator and for the default resource calculator
I was seeing only one container with one core in it because I had dynamicAllocation
mistyped as dynamicAllocatoin
while disabling it and the dynamic allocation, being there, was killing the containers when I was not using them and the yarn UI was indeed reporting the numbers right