Can someone help me understand the relation between JVM and containers in YARN?
pointers to some useful links will also be helpful.
Is it one JVM for each container? or multiple containers in a single JVM? or there is no relation between JVM and containers?
Of course there exists a relation and it's one-to-one. For each container that needs to be created, a new java process(JVM) is spawned.
Now, if you are not running in uber mode, consider following:-
How JVMs are created, is it one JVM for each task? can multiple tasks run in the same JVM at the same time? (I'm aware of ubertasking where many tasks (maps/reduce) can run in same JVM one after the other).
See, tasks are scheduled to run on some node in the cluster. According to requirements(memory and cpu) of task, the capacity of a container is decided. Also there are certain parameters for this which you can find in links below.
Each task attempt is scheduled on a JVM.
when a resource manager allocates containers for a job, does multiple tasks inside the same job use same container for tasks running in same node? or separate containers for each task based on availability?
Separate containers for each task are spawned based on resource availability in the cluster.
Here are some links which very are helpful-
http://ercoppa.github.io/HadoopInternals/AnatomyMapReduceJob.html
https://blog.cloudera.com/blog/2015/09/untangling-apache-hadoop-yarn-part-1/
http://blog.cloudera.com/blog/2014/04/apache-hadoop-yarn-avoiding-6-time-consuming-gotchas/