hadoophdfshdpnamenode

How to remove the very large files under /hadoop/hdfs/journal/hdfsha/current/


in our HDP cluster - version 2.6.5 , with ambari platform

we noticed that /hadoop/hdfs/journal/hdfsha/current/ folder include huge files and more then 1000 files as

-rw-r--r--. 1 hdfs hadoop 494690334 Dec 28 11:37 edits_0000000017251672645-0000000017253719335
-rw-r--r--. 1 hdfs hadoop 524892007 Dec 28 12:37 edits_0000000017253719336-0000000017255810613
-rw-r--r--. 1 hdfs hadoop 509365350 Dec 28 14:17 edits_0000000017255810614-0000000017258005682
-rw-r--r--. 1 hdfs hadoop 526756290 Dec 28 15:07 edits_0000000017258005683-0000000017260117992

in order to minimize the journal edit logs we can maybe use the following as part of HDFS ( hdfs-site.xml )

we not sure if the meaning of - dfs.namenode.num.extra.edits.retained is to retained only 100 files

please advice if the following configuration can help to purge the extra journal files in

dfs.namenode.num.extra.edits.retained=100
dfs.namenode.max.extra.edits.segments.retained=1
dfs.namenode.num.checkpoints.retained=1

reference - https://www.ibm.com/support/pages/how-remove-very-large-files-under-hadoophdfsnamecurrent-biginsights-30-save-disk-space


Solution

  • To clear out the space consumed by jornal edit, you are on right track. However the values are too less and if something goes wrong, you might loose data.

    The default value for dfs.namenode.num.extra.edits.retained and dfs.namenode.max.extra.edits.segments.retained is set to 1000000 and 10000 respectively.

    I would suggest following values:-

    dfs.namenode.num.extra.edits.retained=100000
    dfs.namenode.max.extra.edits.segments.retained=100
    dfs.namenode.num.checkpoints.retained=2
    

    You can find all these parameter details here, The values can be anything and depends on your environment you have to choose.