I am building a Spring Boot 3.2.5 application that retrieves data from Parquet files on an AWS S3 bucket. This data is then converted into CSV and loaded into a Postgres database.
This operation works fine for a while after the application is built. However, after about an hour I receive an exception software.amazon.awssdk.services.s3.model.S3Exception: The provided token has expired.
I am using Hadoop libraries to read the Parquet file as that had proven to be the most conducive to the "read and convert" steps.
The Configuration class is org.apache.hadoop.conf.Configuration and is set up as follows:
Configuration configuration = new Configuration();
configuration.set("fs.s3a.endpoint", filesystemEndpoint);
configuration.set("fs.defaultFS", filesystemType + s3BucketName + s3BucketSubdirectoryPath);
DefaultCredentialsProvider credentialsProvider = DefaultCredentialsProvider.builder().reuseLastProviderEnabled(Boolean.FALSE).build();
AwsCredentials awsCredentials = credentialsProvider.resolveCredentials();
if (ControllerUtils.isRunningOnK8s()) {
configuration.set("fs.s3a.assumed.role.arn", eksRoleArn);
}
configuration.set("fs.s3a.access.key", awsCredentials.accessKeyId());
configuration.set("fs.s3a.secret.key", awsCredentials.secretAccessKey());
if (awsCredentials instanceof AwsSessionCredentials) {
AwsSessionCredentials sessionCredentials = (AwsSessionCredentials) awsCredentials;
configuration.set("fs.s3a.session.token", sessionCredentials.sessionToken());
}
The class in which this code resides is annotated with:
@Service
@Scope("prototype")
The "prototype" is an attempt to ensure a new instance gets created with each run. My searches to this point noted that a credentials provider may be cached otherwise, resulting in the token being expired. The @Service annotation is there because this class has a @Scheduled (cron) component.
I have tried other credentials providers, without positive result. This includes ProfileCredentialsProvider
and StsAssumeRoleCredentialsProvider
.
How can I ensure my application doesn't retain an expired token?
When you inject AWS credentials (access key, secret, session token) directly into the Hadoop Configuration (using fs.s3a.access.key
, etc.), the credentials are snapshotted and NOT automatically refreshed.
If you use temporary credentials (from STS, EKS IRSA, etc.), they expire by default after 1 hour.
The Hadoop S3A filesystem does not natively auto-refresh those credentials once set in the Configuration object. Even using DefaultCredentialsProvider, it only happens at instantiation time, because you only set the creds once.
Result:
After an hour, your app’s S3 requests start failing with S3Exception: The provided token has expired
How:
DO NOT set fs.s3a.access.key
, fs.s3a.secret.key
, or fs.s3a.session.token
in your Hadoop config.
Instead, just configure fs.s3a.aws.credentials.provider it
it to point to the right provider.
For EKS, IAM roles, or EC2 instance roles, S3A will use the credentials and refresh them automatically.
Example:
configuration.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.DefaultAWSCredentialsProviderChain");
configuration.set("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider");
Let AWS SDK handle token refresh!