I searched in google to find info about how to tune the value for - DataNode maximum Java heap size ,except this one -
https://community.hortonworks.com/articles/74076/datanode-high-heap-size-alert.html
https://docs.oracle.com/cd/E19900-01/819-4742/abeik/index.html
but not found formula to calculate the value for DataNode maximum Java heap size
the default value for DataNode maximum Java heap size , is 1G
and we increase this value to 5G , because in some case we saw from datanode logs error about heap size
but this isn't the right way to tune the value
so any suggestion or good article how to set the right value for - datanode logs error about heap size ?
lets say we have the following hadoop cluster size:
10 datanode machines , with 5 disks , while each disk has 1T
Each data node have 32 CPU
Each data node have 256G memory
Based on this info can we find the formula that show the right value for - "datanode logs error about heap size" ?
regarding to hortonworks: they advice to set the Datanode java heap to 4G but I am not sure if this case can covered all scenario?
ROOT CAUSE: DN operations are IO expensive do not require 16GB of the heap.
https://community.hortonworks.com/articles/74076/datanode-high-heap-size-alert.html
RESOLUTION: Tuning GC parameters resolved the issue -
4GB Heap recommendation :
-Xms4096m -Xmx4096m -XX:NewSize=800m
-XX:MaxNewSize=800m -XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70
-XX:ParallelGCThreads=8
In hadoop-env.sh
(also some field in Ambari, just try searching for heap), there's an option for setting the value. Might be called HADOOP_DATANODE_OPTS
in the shell file
8GB is generally a good value for most servers. You have enough memory, though, so I would start there, and actively monitor the usage via JMX metrics in Grafana, for example.
The namenode might need adjusted as well https://community.hortonworks.com/articles/43838/scaling-the-hdfs-namenode-part-1.html