I am new to hadoop. I want to know the difference between snapshot and fsimage used for file system state in hadoop. I heard that both do the same work. then what makes the difference between them?
HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. Any change to the file system namespace or its properties is recorded by the NameNode. The NameNode uses a transaction log called the EditLog to persistently record every change that occurs to file system metadata. For example, creating a new file in HDFS, change in replication factor, etc causes the NameNode to insert a record into the EditLog indicating this. The NameNode uses a file in its local host OS file system to store the EditLog.
FsImage and EditLog come hand in hand that's why this explanation. Now:
The entire file system namespace, including the mapping of blocks to files and file system properties, is stored in a file called the FsImage. The FsImage is stored as a file in the NameNode’s local file system.
Snapshots support storing a copy of data at a particular instant of time. A snapshot can be taken of the entire file system also. This does not involve copying of data but recording filesize, block info, etc to a snapshottable directory.
In very normal terms, FsImage stores the info as to where the data is stored, in how many blocks and related information while Snapshot stores the read-only image of the data/file system.
I hope this explains the difference.