hadoopmapreducehadoop-yarnapache-tezplanning

Suggestions required in increasing utilization of yarn containers on our discovery cluster


Current Setup

Problem statement

Options we are considering

Requests

  1. Is there any other way possible to manage our discovery cluster.
  2. Is there any possibility of reducing container size.
  3. can a vcore (as it's a logical concept) be shared by multiple containers?

Solution

  • Vcores are just a logical unit and not in anyway related to a CPU core unless you are using YARN with CGroups and have yarn.nodemanager.resource.percentage-physical-cpu-limit enabled. Most tasks are rarely CPU-bound but more typically network I/O bound. So if you were to look at your cluster's overall CPU utilization and memory utilization, you should be able to resize your containers based on the wasted (spare) capacity.

    You can measure utilization with a host of tools but sar, ganglia and grafana are the obvious ones but you can also look at Brendan Gregg's Linux Performance tools for more ideas.