amazon-web-serviceshadoophiveclouderacloudera-quickstart-vm

Setting up AWS Credentials - Cloudera Quickstart Docker Container


I am trying to use Cloudera's Quickstart docker container to test simple Hadoop/Hive jobs. I want to be able to run jobs on data in S3, but so far am having problems.

I have added the below properties to core-site.xml, hive-site.xml, hdfs-site.xml.

  <property>
    <name>fs.s3.awsAccessKeyId</name>
    <value>XXXXXX</value>
  </property>

  <property>
    <name>fs.s3.awsSecretAccessKey</name>
    <value>XXXXXX</value>
  </property>

Regardless, in Hive when trying to create an external table pointing to an S3 location, I get the error:

FAILED: SemanticException java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

Solution

  • There are a number of places to potentially set AWS credentials in the Cloudera Quickstart container. However, credential properties in Hadoop config files must be set before the Cloudera services start. It can also be helpful to export AWS keys as environmental variables.

    An example of a Docker image that sets AWS credentials in a Cloudera Quickstart container can be found here, and a blog post on this image can be seen here.

    Essentially the Dockerfile for this image uses a shell script (contents shown below) to set AWS keys as environment variables, as well as uses sed to update the /etc/hadoop/conf/core-site.xml with AWS s3n and s3a credential properties. This script executes before any Cloudera services in the the quickstart container start.

    #!/bin/bash
    
    # ADD ACTUAL AWS KEYS HERE BEFORE RUNNING SCRIPT/BUILDING DOCKER IMAGE
    #######################################################################
    AWS_ACCESS_KEY_ID=REPLACE-ME
    AWS_SECRET_ACCESS_KEY=REPLACE-ME
    ###################################################################3
    
    # add aws creds to .bashrc
    echo "export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID" >> /root/.bashrc
    echo "export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY" >> /root/.bashrc
    
    # make backup of core-site.xml
    mv /etc/hadoop/conf/core-site.xml /etc/hadoop/conf/core-site.xml.bak
    
    # add aws credentials for s3a and s3n to core-site.xml
    cat /etc/hadoop/conf/core-site.xml.bak \
      | sed "s#<\/configuration>#<property>\n<name>fs.s3a.awsAccessKeyId<\/name>\n<value>${AWS_ACCESS_KEY_ID}<\/value>\n<\/property>\n<property>\n<name>fs.s3a.awsSecretAccessKey<\/name>\n<value>${AWS_SECRET_ACCESS_KEY}<\/value>\n<\/property>\n<property>\n<name>fs.s3n.awsAccessKeyId<\/name>\n<value>${AWS_ACCESS_KEY_ID}<\/value>\n<\/property>\n<property>\n<name>fs.s3n.awsSecretAccessKey<\/name>\n<value>${AWS_SECRET_ACCESS_KEY}<\/value>\n<\/property>\n<\/configuration>#g" \
      > /etc/hadoop/conf/core-site.xml