javahadoopambaridatanode

how to tune the "DataNode maximum Java heap size" in hadoop clusters


I searched in google to find info about how to tune the value for - DataNode maximum Java heap size ,except this one -

https://community.hortonworks.com/articles/74076/datanode-high-heap-size-alert.html

https://docs.oracle.com/cd/E19900-01/819-4742/abeik/index.html

but not found formula to calculate the value for DataNode maximum Java heap size

the default value for DataNode maximum Java heap size , is 1G

and we increase this value to 5G , because in some case we saw from datanode logs error about heap size

but this isn't the right way to tune the value

so any suggestion or good article how to set the right value for - datanode logs error about heap size ?

lets say we have the following hadoop cluster size:

  1. 10 datanode machines , with 5 disks , while each disk has 1T

  2. Each data node have 32 CPU

  3. Each data node have 256G memory

Based on this info can we find the formula that show the right value for - "datanode logs error about heap size" ?

regarding to hortonworks: they advice to set the Datanode java heap to 4G but I am not sure if this case can covered all scenario?

ROOT CAUSE: DN operations are IO expensive do not require 16GB of the heap.

https://community.hortonworks.com/articles/74076/datanode-high-heap-size-alert.html

RESOLUTION: Tuning GC parameters resolved the issue -
4GB Heap recommendation : 
-Xms4096m -Xmx4096m -XX:NewSize=800m 
-XX:MaxNewSize=800m -XX:+UseParNewGC 
-XX:+UseConcMarkSweepGC 
-XX:+UseCMSInitiatingOccupancyOnly 
-XX:CMSInitiatingOccupancyFraction=70 
-XX:ParallelGCThreads=8 

Solution

  • In hadoop-env.sh (also some field in Ambari, just try searching for heap), there's an option for setting the value. Might be called HADOOP_DATANODE_OPTS in the shell file

    8GB is generally a good value for most servers. You have enough memory, though, so I would start there, and actively monitor the usage via JMX metrics in Grafana, for example.

    The namenode might need adjusted as well https://community.hortonworks.com/articles/43838/scaling-the-hdfs-namenode-part-1.html