hadoophdfsdistcpbigdata

How to copy HDFS files from one cluster to another cluster by preserving the modification time


I have to move some HDFS files from my production cluster to dev cluster. I have to test some operations on HDFS files after moving to dev cluster based on the file modification time. Need files with different dates to test it in dev.

I tried doing with DISTCP, Modification time is updating with the current time in that. i checked the Distcp by using many parameters that I found here distcp version2 guide

Is there any other way to get the files without changing modification time? or can i change the modification time manually after getting the files into hdfs ?

thanks in advance


Solution

  • Use -pt flag with the hadoop distcp command. This will preserve timestamp (modification time) of the file that is distcp'd.

    hadoop distcp -pt hdfs://src_cluster/file hdfs://dest_cluster/file
    

    Tested with Hadoop-2.7.3

    Refer latest Distcp Guide