I'm using ganglia 3.7.2 for monitoring hadoop(2.6.0-cdh5.4.0) cluster(7 servers), And I enabled metrics2 on hadoop & hbase; I installed gmetad on one server, and gmond on the other servers with yum ; At the beginning , the monitor runs very well, I can see the normal monitor data on the ganglia web page, but the problem is : After several hours, the rrd files are too many, so I have to make symbol link for path /var/lib/ganglia/rrds, and after a couple of days, the rrd files occupied almost 1TB disk space, and web page cannot show up the monitor data, anybody know how to fix this ?
gmond config (using single channel):
globals {
daemonize = yes
setuid = yes
user = ganglia
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
allow_extra_data = yes
host_dmax = 86400 /*secs. Expires (removes from web interface) hosts in 1 day */
host_tmax = 20 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
send_metadata_interval = 60 /*secs */
}
Resolved this problem, and I did the things below:
Changed default rrd rule to :
RRAs "RRA:AVERAGE:0.5:1:256" "RRA:AVERAGE:0.5:24:512" "RRA:AVERAGE:0.5:168:512" "RRA:AVERAGE:0.5:672:512" "RRA:AVERAGE:0.5:5760:1024"
Closed the datanode.sink and nodemanager.sink of hadoop metric2;
Reference: http://www.perzl.org/ganglia/bestpractices.html##Best_Practices_Ganglia_Sampling