I was asked with below question .
Interviewer: how to recover a deleted file in hdfs. Me: from trash directory we can copy/move back to original directory. Interviewer: Is there any other way except from trash recovery. Me: I said No.
So my question is , whether there is really any way to recover deleted files or interviewer just asked me to test my confidence.
I have found below way to recover which is different from hdfs -cp/mv but it is also getting file from trash .
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true -D dfs.checksum.tpe=CRC32C -m 10 -pb -update /users/vijay/.Trash/ /application/data/vijay;
Hadoop has provided HDFS snapshot (SnapShot) function since version 2.1.0 You can try to use it
First,Create SnapShot
hdfs dfsadmin -allowSnapshot /user/hdfs/important
hdfs dfs -createSnapshot /user/hdfs/important important-snapshot
Next,try to delete one file
hdfs dfs -rm -r /user/hdfs/important/important-file.txt
Final,restore it
hdfs dfs -ls /user/hdfs/important/.snapshot/
hdfs dfs -cp /user/hdfs/important/.snapshot/important-snapshot/important-file.txt /user/hdfs/important/
hdfs dfs -cat /user/hdfs/important/important-file.txt
P.S:You have to use CP Command (not MV Command) to recover deleted file in this way Because the deleted file in snapshot is only-read file
Wish my answer can help you