hadoopamazon-web-servicesamazon-ec2amazon-s3starcluster

MIT StarCluster and S3


I am trying to run a mapreduce job on spot instances. I launch my instances by using StarClusters and its hadoop plugin. I have no problem upload the data then put it into HDFS and then copy the result back from the HDFS. My question is that is there way to load the data directly from s3 and push the result back to s3? (I don't want to manually download the data from s3 to HDFS and push the result from HDFS to s3, is there a way to do it in background)?

I am using the standard MIT starcluster ami


Solution

  • you cannot do it, but you can write a script to do that. for example you can use: hadoop distcp s3n://ID:key@mybucket/file /user/root/file to put the file directly to hdfs from s3