hadoophdfs

how t restore a hdfs deleted file


I was asked with below question .

Interviewer: how to recover a deleted file in hdfs. Me: from trash directory we can copy/move back to original directory. Interviewer: Is there any other way except from trash recovery. Me: I said No.

So my question is , whether there is really any way to recover deleted files or interviewer just asked me to test my confidence.

I have found below way to recover which is different from hdfs -cp/mv but it is also getting file from trash .

hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true -D dfs.checksum.tpe=CRC32C -m 10 -pb -update /users/vijay/.Trash/ /application/data/vijay;


Solution

  • Hadoop has provided HDFS snapshot (SnapShot) function since version 2.1.0 You can try to use it

    First,Create SnapShot

    hdfs dfsadmin -allowSnapshot /user/hdfs/important
    hdfs dfs -createSnapshot /user/hdfs/important important-snapshot
    

    Next,try to delete one file

    hdfs dfs -rm -r /user/hdfs/important/important-file.txt
    

    Final,restore it

    hdfs dfs -ls /user/hdfs/important/.snapshot/
    hdfs dfs -cp /user/hdfs/important/.snapshot/important-snapshot/important-file.txt /user/hdfs/important/
    hdfs dfs -cat /user/hdfs/important/important-file.txt
    

    P.S:You have to use CP Command (not MV Command) to recover deleted file in this way Because the deleted file in snapshot is only-read file

    Wish my answer can help you