I have to move some HDFS files from my production cluster to dev cluster. I have to test some operations on HDFS files after moving to dev cluster based on the file modification time. Need files with different dates to test it in dev.
I tried doing with DISTCP, Modification time is updating with the current time in that. i checked the Distcp by using many parameters that I found here distcp version2 guide
Is there any other way to get the files without changing modification time? or can i change the modification time manually after getting the files into hdfs ?
thanks in advance
Use -pt
flag with the hadoop distcp
command. This will p
reserve t
imestamp (modification time) of the file that is distcp'd.
hadoop distcp -pt hdfs://src_cluster/file hdfs://dest_cluster/file
Tested with Hadoop-2.7.3
Refer latest Distcp Guide