hadoopmemoryhadoop-yarn

How to make Hadoop YARN faster with memory and vcore configuration?


On Hadoop YARN, if I have more containers to run map task or reduce task, would it become faster to process a job?

So if that's true when I make container allocation memory smaller than default, I can get more containers run on the host, and make the job faster.

And how about vcore, I mean if we have more containers to run, but it will run one by one according to vcore allocation right? In other words, whether many containers or few, it still runs one by one.


Solution

  • No, tasks can run in parallel.

    Lets consider your YARN cluster has 24 core and 96 GB memory. default value of mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores is 1

    So, you can launch 24 container with 4 GM memory each and they can run in parallel. If your job needs more than 24 container then first 24 tasks will be launched initially and subsequent tasks will be launched as soon as required resources(containers) are available.