in our HDP cluster - version 2.6.5 , with ambari platform
we noticed that /hadoop/hdfs/journal/hdfsha/current/
folder include huge files and more then 1000 files as
-rw-r--r--. 1 hdfs hadoop 494690334 Dec 28 11:37 edits_0000000017251672645-0000000017253719335
-rw-r--r--. 1 hdfs hadoop 524892007 Dec 28 12:37 edits_0000000017253719336-0000000017255810613
-rw-r--r--. 1 hdfs hadoop 509365350 Dec 28 14:17 edits_0000000017255810614-0000000017258005682
-rw-r--r--. 1 hdfs hadoop 526756290 Dec 28 15:07 edits_0000000017258005683-0000000017260117992
in order to minimize the journal edit logs we can maybe use the following as part of HDFS ( hdfs-site.xml )
we not sure if the meaning of - dfs.namenode.num.extra.edits.retained
is to retained only 100 files
please advice if the following configuration can help to purge the extra journal files in
dfs.namenode.num.extra.edits.retained=100
dfs.namenode.max.extra.edits.segments.retained=1
dfs.namenode.num.checkpoints.retained=1
To clear out the space consumed by jornal edit, you are on right track. However the values are too less and if something goes wrong, you might loose data.
The default value for dfs.namenode.num.extra.edits.retained
and dfs.namenode.max.extra.edits.segments.retained
is set to 1000000
and 10000
respectively.
I would suggest following values:-
dfs.namenode.num.extra.edits.retained=100000
dfs.namenode.max.extra.edits.segments.retained=100
dfs.namenode.num.checkpoints.retained=2
You can find all these parameter details here, The values can be anything and depends on your environment you have to choose.