Current Setup
Problem statement
as our cluster is extensively used by data scientists and analysts, having just 24 containers does not suffice. This leads to heavy resource contention.
Is there any way we can increase number of containers?
Options we are considering
Requests
Vcores are just a logical unit and not in anyway related to a CPU core unless you are using YARN with CGroups and have yarn.nodemanager.resource.percentage-physical-cpu-limit
enabled. Most tasks are rarely CPU-bound but more typically network I/O bound. So if you were to look at your cluster's overall CPU utilization and memory utilization, you should be able to resize your containers based on the wasted (spare) capacity.
You can measure utilization with a host of tools but sar
, ganglia
and grafana
are the obvious ones but you can also look at Brendan Gregg's Linux Performance tools for more ideas.