scalaamazon-s3emrdistcp

distcp: copy file from hdfs to s3 (How to use in scala or java)


I am trying to copy huge files from hdfs to s3 via distcp through the following code:

val files:Array[String] = new Array[String](2)
files(0) = "/****/in.zip"

val in =  new Path(new URI("/**/in.zip"))
val out = new Path(new URI("***/out.zip"))
var distcpOpt = new DistCpOptions(in,out)
ToolRunner.run(new DistCp(new Configuration(),distcpOpt), files)

I tried to do something similar to this link.

anyone did this before, please help.


Solution

  • I found the solution:

    1- the files array should has two value: First one for the input and the second one for the output.

    2- distcpOpt does not need to have any value (empty string is enough)

    3- be sure from s3 path