javaamazon-s3streamingnettyaws-crt

S3AsyncClient and AsyncResponseTransformer maintain back-pressure during download


I have built a typical download API using Spring Reactive stack and AWS Java SDK v2. Basically, there is a controller which calls s3AsyncClient to download

@GetMapping(path="/{filekey}")
Mono<ResponseEntity<Flux<ByteBuffer>>> downloadFile(@PathVariable("filekey") String filekey) {    
    GetObjectRequest request = GetObjectRequest.builder()
      .bucket(s3config.getBucket())
      .key(filekey)
      .build();
    
    return Mono.fromFuture(s3client.getObject(request, AsyncResponseTransformer.toPublisher()))
      .map(response -> {
        checkResult(response.response());
        String filename = getMetadataItem(response.response(),"filename",filekey);            
        return ResponseEntity.ok()
          .header(HttpHeaders.CONTENT_TYPE, response.response().contentType())
          .header(HttpHeaders.CONTENT_LENGTH, Long.toString(response.response().contentLength()))
          .header(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=\"" + filename + "\"")
          .body(Flux.from(response));
      });
}

Javadoc for that AsyncResponseTransformer.toPublisher() publisher interface includes this:

You are responsible for subscribing to this publisher and managing the associated back-pressure. Therefore, this transformer is only recommended for advanced use cases.

Netty is configured to use Direct No Cleaner method, i.e. allocates DirectByteBuffers instead of HeapBuffers, also it uses UNSAFE to allocate/deallocate the buffers.

-Dio.netty.maxDirectMemory is 2 or 3GB (tested various behavior).

What I am seeing is that from time to time there are OutOfDirectMemory errors and connection dropped. The client gets Premature End of Content Stream.

It seems like S3AsyncClient may outperform the consumers of the data and direct buffers overflow, no matter how much memory I give to netty. JVM stays intact at around 300MB.

I came across this for netty: OOM killed JVM with 320 x 16MB Netty DirectByteBuffer objects

You cannot control the amount of memory, if not causing OOM as you have done. Netty pooling won't behave like the Java GC vs heap ie increasing some throttling/frequency of its work in order to use resources within specified limits (throwing OOM just under specific circumstances). Netty memory pooling is built to mimic the behaviour of a native allocator eg jemalloc, hence its purpose is to retain as much memory as the application need to work. For this reason, the retained direct memory depends by the allocation pressure that the application code perform ie how many outstanding alloc without release.

I suggest, instead, to embrace its nature, prepare an interesting test load on a preprod/test machine and just monitor the Netty direct memory usage of the application you're interested in. I suppose you've configured -Dio.netty.maxDirectMemory=0 for the purpose of using JMX to expose the direct memory used, but Netty can expose it's own metrics as well (saving setting io.netty.maxDirectMemory), just check that the libraries that use it take care of exposing through JMX or using whatever metrics framework. If these applications won't expose it, the API is fairly easy to be used, see https://netty.io/4.1/api/io/netty/buffer/PooledByteBufAllocatorMetric.html

I am using netty 4.1.89 or 4.1.108 (tried to update) AWS SDK v2 2.23.21 And AWS CRT client 0.29.14 (latest)

I tried doing Flux.from(response).rateLimit(1) with no luck.

My performance test is to download 500MB files in parallel with up to 40 users. The node has 8GB of mem total and 1 CPU unit.

I can understand that this is not enough to handle all users, but was expecting that it will backpressure automatically and keep streaming files just slower, i.e. get next buffer from S3 -> write next buffer to user1, get next buffer from s3 -> write to user2, etc.

However, even when I am using just 1 slow consumer, I see that Netty reports direct memory consumption up to 500MB and if I stop it drops to 16MB (default PoolArena cache I suppose). So, it sounds like S3 Async Client pushes all 500MB into netty's direct buffers and the client slowly drains these.

Trying to limit AWS CRT throughput: targetThroughputInGbps(0.1) didn't help.

I have a feeling that S3AsyncClient+CRT+spring boot netty doesn't automatically handle backpressure. https://github.com/netty/netty/issues/13751

As I can't control the download speed from the client side (might be slow or fast connection), how can I maintain back-pressure to keep direct buffers at a certain limit? Is it possible at all?


Solution

  • I have opened an issue against aws sdk: https://github.com/aws/aws-sdk-java-v2/issues/5158

    At the same time I discovered the reason: s3 async client (regardless of underlaying http client) does respect the request(n) done on the Flux<ByteBuffer> by reactor-netty. The problem is the chunk size which is different between different clients.

    s3-CRT uses 8MB chunk by default.

    s3-Netty uses 8KB chunk by default, i.e. 1024 times smaller size.

    Reactor-netty requests 128 items first, and then refills by 64. (@see MonoSendMany and MonoSend MAX_SIZE/REFILL_SIZE).

    Now, if your consumer is slow enough and downloads a large file, reactor-netty requests 128 * 8 = 1024MB from s3-crt and eventually reactor-netty buffers are filled up with that data even though channel WRITABILITY_CHANGED to false.

    And if you download multiple files it's easy to hit the wall of max direct memory limit.

    Since MAX_SIZE/REFILL_SIZE are hardcoded static fields in reactor-netty, the only solution is to reduce S3 part/chunk size by using:

    S3AsyncClient.crtBuilder()
        .minimumPartSizeInBytes(1L * 1024 * 1024) // 1 MB
    

    This will let S3-crt to push 128 * 1 = 128MB max to reactor-netty buffer per download request. While it may slow down the overall throughput/performance of the s3 async client and downloads, it helps to support more downloads in parallel without failing with OutOfDirectMemoryError.

    This is more like a workaround rather than solution, but until there is a way to configure reactor-netty backpressure MAX_SIZE/REFILL_SIZE, I'll have to use it.