How would I get Amazon EMR (0.20.205 MapR) to use S3 buckets for input and output?
I tried adding the following to the core configuration xml file (through bootstrap actions):
<property>
<name>fs.default.name</name>
<value>s3n://</value>
</property>
<property>
<name>dfs.name.default</name>
<value>s3n://</value>
</property>
But I always get something like:
Caused by: java.io.IOException: Could not resolve path: s3n://some_out_bucket/out at com.mapr.fs.MapRFileSystem.lookupClient(MapRFileSystem.java:219) at com.mapr.fs.MapRFileSystem.delete(MapRFileSystem.java:385) at cc.mrlda.ParseCorpus.run(ParseCorpus.java:192) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at cc.mrlda.ParseCorpus.main(ParseCorpus.java:675) ... 10 more
Hadoop newbie here. Please help!
Further to the configuration steps described in the question above, I have modified the code:
FileSystem fs = FileSystem.get(URI.create(outputPath), new JobConf(SomeClass.class));
where outputPath
points to a resource on S3 e.g. s3n://some_bucket
Using URI.create
, I am now able to access files directly from S3.