hadoopamazon-s3apache-sparkjets3t

Configure Hadoop to use S3 requester-pays-enabled


I'm using Hadoop (via Spark), and need to access S3N content which is requester-pays. Normally, this is done by enabling httpclient.requester-pays-buckets-enabled = true in jets3t.properties. Yet, I've set this and Spark / Hadoop are ignoring it. Perhaps I'm putting the jets3t.properties in the wrong place (/usr/share/spark/conf/). How can I get Hadoop / Spark / JetS3t to access requestor-pays buckets?

UPDATE: This is needed if you are outside Amazon EC2. Within EC2, Amazon doesn't require requester-pays. So, a crude workaround is to run out of EC2.


Solution

  • Environment variables and config files didn't work, but some manual code did: sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "PUTTHEKEYHERE")