eclipsehadoophdfs

Fastest way to upload text files into HDFS(hadoop)


Iam trying to upload 1 million text files into HDFS. So, uploading those files using Eclipse is taking around 2 hours. Can anyone please suggest me any fast technique to do this thing.? What Iam thinking of is : To zip all the text files into a single zip and then upload that into HDFS and finally using some unzipping technique , I would extract those files onto HDFS. Any help will be appreciated.


Solution

  • Distcp is a good way to upload files to HDFS, but for your particular use case (you want to upload local files to a single node cluster running in the same computer) the best thing is not to upload the files to HDFS at all. You can use localfs (file://a_file_in_your_local_disk) instead of HDFS, so no need to upload the files.

    See this other SO question for examples on how to do this.