jclouds: getBucketLocation timeout on getBlob

I'm using jclouds 2.5.0. It's working perfectly in all of our deployments except for one. In this case, we're seeing the following jclouds message in our log4j2 logs:

2022-07-14 21:37:29.263 +0000,3124098302712886 {} ERROR o.j.h.h.BackoffLimitedRetryHandler [clrd-highpri-1] Cannot retry after server error, command has exceeded retry limit 5: [method=org.jclouds.aws.s3.AWSS3Client.public abstract java.lang.String org.jclouds.s3.S3Client.getBucketLocation(java.lang.String)[hammerspace-data-bucket-us-west-2], request=GET https://s3.amazonaws.com/hammerspace-data-bucket-us-west-2?location HTTP/1.1]

This message occurs during a getBlob call, so I'm assuming part of getBlob is to determine the bucket from which the blob should be retrieved. This call is failing 5 times - but not just failing with a bad return code - it's hanging and timing out, so these 5 retries are taking up the lion's share of the time it takes to download the blob.

After getBlob finally stops calling getBucketLocation, it then tries the download with the default region (us-east-1). Since the bucket is actually in us-west-2, the download takes a bit longer than it should, but - again - the actual download bottleneck is the failed calls to getBucketLocation.

Has anyone seen anything like this before?

I'd also be interested in knowing how to turn on more jclouds logging. I used to uncomment lines like this in my log4j2.xml file:

        <!-- <logger name="org.jclouds" level="debug" additivity="true" /> -->
        <!-- <logger name="jclouds.compute" level="debug" additivity="true" /> -->
        <!-- <logger name="jclouds.wire" level="debug" additivity="true" /> -->
        <!-- <logger name="jclouds.headers" level="debug" additivity="true" /> -->
        <!-- <logger name="jclouds.ssh" level="debug" additivity="true" /> -->
        <!-- <logger name="software.amazon.awssdk" level="debug" additivity="true" /> -->
        <!-- <logger name="org.apache.http.wire" level="debug" additivity="true" /> -->

But these don't seem to have any effect in 2.5.0 anymore.

Finally, if anyone knows how I can stop getBlob from calling getBucketLocation, I'd much appreciate some advice here. I'm thinking there must be a way to specify the desired bucket to the jclouds blob context up front so it doesn't have to resolve it.

John

[Update 1]

We thought originally the problem was we didn't have our AIM profile configured correctly for the bucket, but after playing with it, we were able to run the AWS command line tool from the same host on that bucket and it didn't hang, but jclouds is still hanging on getBucketLocation on the same box. I'm completely stumped by this. It HAS to be something internal to jclouds 2.5.0 with the AWS provider.

Solution

I've discovered the root cause of this issue and thought there might be others out there that would like to know what's going on.

Amazon has a general work flow they publish that allows clients to always find the correct URL endpoint for a given bucket:

ask s3.amazonaws.com for the bucket location
use the url returned to make the container specific request (get/put, etc)

If a client is slightly more intelligent, it will ask only on the first request and cache the bucket location URL and reuse it in subsequent requests.

If a client is even more intelligent, and it notices a region-specific URL is specified, it will use the URL directly to attempt a request. Upon failure, it will then call back to the US west coast to get the bucket location, cache it and use it.

Apparently, jclouds is only at intelligence level 1 above. It completely ignores the specified URL, but it does at least cache the results from the first getBucketLocation call and use that region-specific URL, as needed.

Internally, it's using a google guava LoadingCache for this process. It might be nice if there was a mechanism in jclouds to pre-load this cache with known region-specific URLs for a given bucket. Then it would not have to go off box for the getLocation data - even on the first request.

I hope this is helpful to others. It sure cost me a lot of pain to find out. And since I received no answers from any of my jclouds mailing list queries, I have to assume that there was no one in the jclouds community that understood how this worked either. (Or perhaps I just didn't word my query well enough.)

[UPDATE]

I did find a work around for this. I wrote this static inner class in my jclouds-consuming client:

@ConfiguresHttpApi
private static class BucketToRegionHack extends AWSS3HttpApiModule {
    private String region;
    private String bucket;

    public void setBucketForRegion(String region, String bucket) {
        this.region = region;
        this.bucket = bucket;
    }

    @Override
    @SuppressWarnings("Guava")
    protected CacheLoader<String, Optional<String>> bucketToRegion(Supplier<Set<String>> regionSupplier, S3Client client) {
        Set<String> regions = regionSupplier.get();
        if (regions.isEmpty()) {
            return new CacheLoader<String, Optional<String>>() {

                @Override
                @SuppressWarnings({"Guava", "NullableProblems"})
                public Optional<String> load(String bucket) {
                    if (BucketToRegionHack.this.bucket != null && BucketToRegionHack.this.bucket.equals(bucket)) {
                        return Optional.of(BucketToRegionHack.this.region);
                    }
                    return Optional.absent();
                }

                @Override
                public String toString() {
                    return "noRegions()";
                }
            };
        } else if (regions.size() == 1) {
            final String onlyRegion = Iterables.getOnlyElement(regions);
            return new CacheLoader<String, Optional<String>>() {
                @SuppressWarnings("OptionalUsedAsFieldOrParameterType")
                final Optional<String> onlyRegionOption = Optional.of(onlyRegion);

                @Override
                @SuppressWarnings("NullableProblems")
                public Optional<String> load(String bucket) {
                    if (BucketToRegionHack.this.bucket != null && BucketToRegionHack.this.bucket.equals(bucket)) {
                        return Optional.of(BucketToRegionHack.this.region);
                    }
                    return onlyRegionOption;
                }

                @Override
                public String toString() {
                    return "onlyRegion(" + onlyRegion + ")";
                }
            };
        } else {
            return new CacheLoader<String, Optional<String>>() {
                @Override
                @SuppressWarnings("NullableProblems")
                public Optional<String> load(String bucket) {
                    if (BucketToRegionHack.this.bucket != null && BucketToRegionHack.this.bucket.equals(bucket)) {
                        return Optional.of(BucketToRegionHack.this.region);
                    }
                    try {
                        return Optional.fromNullable(client.getBucketLocation(bucket));
                    } catch (ContainerNotFoundException e) {
                        return Optional.absent();
                    }
                }

                @Override
                public String toString() {
                    return "bucketToRegion()";
                }
            };
        }
    }
}

This is mostly a copy of the code as it exists in S3HttpApiModule in jclouds. Then I added the following snippet to my init code where I'm setting up the JClouds client:

        BucketToRegionHack b2mModule = new BucketToRegionHack();
        contextBuilder.modules(ImmutableSet.of(b2mModule));
        Pattern pattern = Pattern.compile("s3-([a-z0-9-]+)\\.amazonaws.com");
        Matcher matcher = pattern.matcher(endpoint);
        if (matcher.find()) {
            String region = matcher.group(1);
            b2mModule.setBucketForRegion(region, cspInfo.getContainer());
        }

...where 'contextBuilder' is the jclouds context builder I'm using. This essentially overrides the S3HttpApiModule with my own version, which allows me to provide my own bucket-to-region method that pre-loads the LoadingCache with my known bucket and region.

A better fix for this would expose a way for users to simply preload the loading cache with a map of buckets to regions so no calls to getBucketLocation would be made for those that are pre-loaded.