I want to give short background about the namenodes and fsimage/edit_logs
, and how namenode works in hadoop clusters,
The NameNode
stores modifications to the file system as a log appended to a native file system file, edits.
When a NameNode
starts up, it reads HDFS state from an image file, fsimage
, and then applies edits from the edits log file.
It then writes new HDFS state to the fsimage
and starts normal operation with an empty edits file.
FsImage
is a file stored on the OS filesystem that contains the complete directory structure (namespace) of the HDFS with details about the location of the data on the Data Blocks and which blocks are stored on which node.
EditLogs
is a transaction log that recorde the changes in the HDFS file system or any action performed on the HDFS cluster such as addtion of a new block,
replication, deletion etc., It records the changes since the last FsImage
was created,
it then merges the changes into the FsImage
file to create a new FsImage
file.
When we are starting namenode, latest FsImage
file is loaded into "in-memory" and at the same time,
EditLog
file is also loaded into memory if FsImage
file does not contain up to date information.
Namenode
stores metadata in "in-memory" in order to serve the multiple client request(s) as fast as possible.
If this is not done, then for every operation , namenode
has to read the metadata information from the disk to in-memory. This process will consume more disk seek time for every operation.
so lets summary
Persistence of HDFS metadata broadly consist of two categories of files:
fsimage
Contains the complete state of the file system at a point in time. Every file system modification is assigned a unique, monotonically increasing transaction ID. An fsimage file represents the file system state after all modifications up to a specific transaction ID.
edits file
Contains a log that lists each file system change (file creation, deletion or modification) that was made after the most recent fsimage.
Checkpointing
is the process of merging the content of the most recent fsimage, with all edits applied after that fsimage is merged, to create a new fsimage. Checkpointing is triggered automatically by configuration policies or manually by HDFS administration commands.
Until now the brief about namenode
and edit logs
So lets talk now about our cluster ( its based on HDP version 2.6.5
)
In folder /var/hadoop/hdfs/namenode/current
of each namenode
, we have the following fsimage files
fsimage_0000000000000031788 100% 104KB 104.1KB/s 00:00
fsimage_0000000000000031788.md5 100% 62 0.1KB/s 00:00
fsimage_0000000000000041641 100% 104KB 104.1KB/s 00:00
fsimage_0000000000000041641.md5 100% 62 0.1KB/s 00:00
also the edit logs ,
.
.
.
-rw-r--r-- 1 hdfs hadoop 328138542 Jan 23 12:37 edits_0000000022056979997-0000000022059239786
-rw-r--r-- 1 hdfs hadoop 301415558 Jan 23 13:07 edits_0000000022059239787-0000000022061345588
-rw-r--r-- 1 hdfs hadoop 311747850 Jan 23 13:37 edits_0000000022061345589-0000000022063490851
-rw-r--r-- 1 hdfs hadoop 12 Jan 23 13:37 seen_txid
-rw-r--r-- 1 hdfs hadoop 330301440 Jan 24 07:10 edits_0000000022063490852-0000000022065448335
Now , we start both namenode
,
In the namenode
logs we see that namenode
replaying each of the edit log ( so if for example we have 1965 edit_logs then namenode
is replaying to all them one by one .....)
Example:
2020-01-27 06:20:37,306 INFO namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(266)) - replaying edit log: 2072759/2282427 transactions completed. (91%)
2020-01-27 06:20:38,307 INFO namenode.FSEditLogLoader (FSEditLogLoader.java:loadEditRecords(266)) - replaying edit log: 2214991/2282427 transactions completed. (97%)
So namenode
completely started with active/standby state after replaying all 1965 edit_logs
,
And this takes almost 17 hours
So after we restart both namenodes
, we expect to get fsimage
files up to date
For example:
-rw-r--r-- 1 hdfs hadoop 445716 Jan 31 08:11 fsimage_0000000000000132222
-rw-r--r-- 1 hdfs hadoop 62 Jan 31 08:11 fsimage_0000000000000132222.md5
But in our case after both namenode restart we get this example ( fsimage not update - time from Jan 03 )
-rw-r--r-- 1 hdfs hadoop 445716 Jan 03 07:11 fsimage_0000000000000132222
-rw-r--r-- 1 hdfs hadoop 62 Jan 03 07:11 fsimage_0000000000000132222.md5
So we can see that fsimage
was not update , in spite both namenode
completely started ( after 17 hours ) and with state of active/standby
Any suggestion why fsimage
not update with the current time ?
You can create a fsimage file running the checkpoint manually with these commands:
hdfs dfsadmin -safemode enter
hdfs dfsadmin -saveNamespace
hds dfsadmin -safemode leave
IMPORTANT: while doing this commands Hadoop is not available online, so ensure you have HA active and your clients acknowledge this pause (this can take around 5 minutes to complete or more)