hadoophdfsnamenode

Namenode in hadoop cluster and fsimage and Edit_logs consept


I want to give short background about the namenodes and fsimage/edit_logs , and how namenode works in hadoop clusters,

The NameNode stores modifications to the file system as a log appended to a native file system file, edits.

When a NameNode starts up, it reads HDFS state from an image file, fsimage, and then applies edits from the edits log file.

It then writes new HDFS state to the fsimage and starts normal operation with an empty edits file.

FsImage is a file stored on the OS filesystem that contains the complete directory structure (namespace) of the HDFS with details about the location of the data on the Data Blocks and which blocks are stored on which node.

EditLogs is a transaction log that recorde the changes in the HDFS file system or any action performed on the HDFS cluster such as addtion of a new block,

replication, deletion etc., It records the changes since the last FsImage was created,

it then merges the changes into the FsImage file to create a new FsImage file.

When we are starting namenode, latest FsImage file is loaded into "in-memory" and at the same time,

EditLog file is also loaded into memory if FsImage file does not contain up to date information.

Namenode stores metadata in "in-memory" in order to serve the multiple client request(s) as fast as possible.

If this is not done, then for every operation , namenode has to read the metadata information from the disk to in-memory. This process will consume more disk seek time for every operation.

so lets summary


Persistence of HDFS metadata broadly consist of two categories of files:

fsimage

Contains the complete state of the file system at a point in time. Every file system modification is assigned a unique, monotonically increasing transaction ID. An fsimage file represents the file system state after all modifications up to a specific transaction ID.

edits file

Contains a log that lists each file system change (file creation, deletion or modification) that was made after the most recent fsimage.

Checkpointing

is the process of merging the content of the most recent fsimage, with all edits applied after that fsimage is merged, to create a new fsimage. Checkpointing is triggered automatically by configuration policies or manually by HDFS administration commands.


Until now the brief about namenode and edit logs

So lets talk now about our cluster ( its based on HDP version 2.6.5 )

In folder /var/hadoop/hdfs/namenode/current of each namenode , we have the following fsimage files

fsimage_0000000000000031788                                                                                                                                100%  104KB 104.1KB/s   00:00
fsimage_0000000000000031788.md5                                                                                                                            100%   62     0.1KB/s   00:00
fsimage_0000000000000041641                                                                                                                                100%  104KB 104.1KB/s   00:00
fsimage_0000000000000041641.md5                                                                                                                            100%   62     0.1KB/s   00:00

also the edit logs ,

 .
 .
 .

-rw-r--r--  1 hdfs hadoop  328138542 Jan 23 12:37 edits_0000000022056979997-0000000022059239786
-rw-r--r--  1 hdfs hadoop  301415558 Jan 23 13:07 edits_0000000022059239787-0000000022061345588
-rw-r--r--  1 hdfs hadoop  311747850 Jan 23 13:37 edits_0000000022061345589-0000000022063490851
-rw-r--r--  1 hdfs hadoop         12 Jan 23 13:37 seen_txid
-rw-r--r--  1 hdfs hadoop  330301440 Jan 24 07:10 edits_0000000022063490852-0000000022065448335

Now , we start both namenode ,

In the namenode logs we see that namenode replaying each of the edit log ( so if for example we have 1965 edit_logs then namenode is replaying to all them one by one .....)

Example:

2020-01-27 06:20:37,306 INFO  namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(266)) - replaying edit log: 2072759/2282427 transactions completed. (91%)
2020-01-27 06:20:38,307 INFO  namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(266)) - replaying edit log: 2214991/2282427 transactions completed. (97%)

So namenode completely started with active/standby state after replaying all 1965 edit_logs , And this takes almost 17 hours

So after we restart both namenodes , we expect to get fsimage files up to date

For example:

-rw-r--r-- 1 hdfs hadoop  445716 Jan 31 08:11 fsimage_0000000000000132222
-rw-r--r-- 1 hdfs hadoop      62 Jan 31 08:11 fsimage_0000000000000132222.md5

But in our case after both namenode restart we get this example ( fsimage not update - time from Jan 03 )

-rw-r--r-- 1 hdfs hadoop  445716 Jan 03 07:11 fsimage_0000000000000132222
-rw-r--r-- 1 hdfs hadoop      62 Jan 03 07:11 fsimage_0000000000000132222.md5

So we can see that fsimage was not update , in spite both namenode completely started ( after 17 hours ) and with state of active/standby

Any suggestion why fsimage not update with the current time ?


Solution

  • You can create a fsimage file running the checkpoint manually with these commands:

    hdfs dfsadmin -safemode enter
    hdfs dfsadmin -saveNamespace
    hds dfsadmin -safemode leave
    

    IMPORTANT: while doing this commands Hadoop is not available online, so ensure you have HA active and your clients acknowledge this pause (this can take around 5 minutes to complete or more)