javaamazon-ec2

Intermittent java.io.IOException: Connection reset (not connection reset by peer) in jdk.internal.net.http.HttpClientImpl.send from inside Amazon EC2


Using openjdk 14.0.1 in Amazon EC2 I'm getting an intermittent java.io.IOException: Connection reset when hitting an external REST service with a POST request over https. This does not happen outside of EC2.

It happens once after a short period of inactivity and resolves once the REST service is called again. Rinse repeat.

The error does not occur hitting the external REST service directly via curl using a similar load, only hitting it after short periods of inactivity from within the JVM.

I've been focusing around the connection pooling in jdk.internal.net.http.Http2ClientImpl. My theory is a stale connection is being picked up from the pool but I have no concrete evidence of this.

Perhaps it's something else. What could be causing this?

In the meantime I've swapped out the HTTP client for the Apache offering and so far, so good.

Stack trace:

java.io.IOException: Connection reset
    at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:576)
    at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119)
    at Main.main(Main.java:201)
Caused by: java.net.SocketException: Connection reset
    at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345)
    at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376)
    at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1153)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:821)
    at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:175)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:763)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:941)
    at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:245)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:968)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:923)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1510)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:923)
java.io.IOException: Connection reset
    at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:576)
    at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119)
    at Main.main(Main.java:201)
Caused by: java.net.SocketException: Connection reset
    at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345)
    at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376)
    at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1153)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:821)
    at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:175)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
    at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:763)
    at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:941)
    at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:245)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:968)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:923)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1510)
    at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:923)

With heavy debugging on:

DEBUG: [qtp1724731843-13] [1810s 799ms] AsyncSSLConnection(SSLTube(SocketTube(1))) HttpPublisher: sending 1646 bytes (2 buffers) to SSLTube(SocketTube(1))
DEBUG: [qtp1724731843-13] [1810s 800ms] SSLTube(SocketTube(1)) sending 2  buffers to SSL flow delegate
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) onNext
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) added 2 (1646 bytes) to the writeList
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) processData, writeList remaining:1646, hsTriggered:false, needWrap:false
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) wrapping 1646 bytes
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) SSLResult: Status = OK HandshakeStatus = NOT_HANDSHAKING
bytesConsumed = 1646 bytesProduced = 1684 sequenceNumber = 20
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) OK => produced: 1684 bytes into 1688, not wrapped: 0
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) wrapBuffer returned Status = OK HandshakeStatus = NOT_HANDSHAKING
bytesConsumed = 1646 bytesProduced = 1684 sequenceNumber = 20
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) Sending 1684 bytes downstream
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) Adding 1684 to outputQ queue
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) pushScheduler is alive
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) DownstreamPusher: queue not empty, downstreamSubscription: SubscriptionBase: window = 1 cancelled = false
DEBUG: [qtp1724731843-13] [1810s 800ms] SSL Writer(SocketTube(1)) DownstreamPusher: Pushing 1684 bytes downstream
DEBUG: [qtp1724731843-13] [1810s 800ms] SocketTube(1) trying to write: 1684
DEBUG: [qtp1724731843-13] [1810s 807ms] SocketTube(1) write error: java.io.IOException: Broken pipe
DEBUG: [qtp1724731843-13] [1810s 808ms] SocketTube(1) error signalled java.io.IOException: Broken pipe
DEBUG: [qtp1724731843-13] [1810s 808ms] SocketTube(1) write: cancel
DEBUG: [qtp1724731843-13] [1810s 808ms] SocketTube(1) write: resetting demand to 0
DEBUG: [qtp1724731843-13] [1810s 808ms] AsyncSSLConnection(SSLTube(SocketTube(1))) HttpPublisher: cancelled by SSLTube(SocketTube(1))
DEBUG: [HttpClient-1-SelectorManager] [1810s 809ms] SocketTube(1) Sending error java.net.SocketException: Connection reset to subscriber READER: SubscriberWrapper: upstreamCompleted: false upstreamWindow: 1 downstreamCompleted: false completionAcknowledged: false outputQ size: 0 cf: jdk.internal.net.http.common.MinimalFuture@e19f43d[Not completed, 4 dependents] (id=21) downstreamSubscription: SubscriptionBase: window = 9223372036854775778 cancelled = false downstreamSubscriber: SSLSubscriberWrapper[SSLTube(SocketTube(1)), delegate: DelegateWrapper[subscribedCalled: true, subscribedDone: true, completed: false, error: null]: jdk.internal.net.http.Http2Connection$Http2TubeSubscriber@6500fba3, getALPN: jdk.internal.net.http.common.MinimalFuture@7dbd5c04[Completed normally] (id=27), onCompleteReceived: false, onError: null], readBuf: java.nio.HeapByteBuffer[pos=1618 lim=1618 cap=4096], count: 14238, scheduler: running, status: OK, handshakeState: 0, engine: NOT_HANDSHAKING
DEBUG: [HttpClient-1-SelectorManager] [1810s 809ms] SocketTube(1) forwarding error to subscriber: java.net.SocketException: Connection reset
DEBUG: [HttpClient-1-SelectorManager] [1810s 809ms] SSL Reader(SocketTube(1)) onError: java.net.SocketException: Connection reset
DEBUG: [HttpClient-1-SelectorManager] [1810s 809ms] SSL Reader(SocketTube(1)) error java.net.SocketException: Connection reset
DEBUG: [qtp1724731843-13] [1810s 810ms] SocketTube(1) leaving w.onNext Reading: [ops=0, demand=0, stopped=false], Writing: [ops=0, demand=0]
DEBUG: [qtp1724731843-13] [1810s 810ms] SSL Writer(SocketTube(1)) upstreamWindowUpdate, downstreamQueueSize:0, upstreamWindow:0
DEBUG: [qtp1724731843-13] [1810s 810ms] SSL Writer(SocketTube(1)) requesting 1
DEBUG: [qtp1724731843-13] [1810s 810ms] SSLTube(SocketTube(1)) request: n=1
DEBUG: [qtp1724731843-13] [1810s 810ms] AsyncSSLConnection(SSLTube(SocketTube(1))) HttpPublisher: got request of 1 from SSLTube(SocketTube(1))
DEBUG: [qtp1724731843-13] [1810s 810ms] Http2Connection(SSLTube(SocketTube(1)))/Stream(19) trySend: request 1
DEBUG: [qtp1724731843-13] [1810s 811ms] Http2Connection(SSLTube(SocketTube(1)))/Stream(19) RequestSubscriber: onComplete
java.net.SocketException: Connection reset
        at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345)
        at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376)
        at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1153)
        at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:821)
        at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:175)
        at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
        at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
        at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
        at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:763)
        at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:941)

Solution

  • TLDR: A NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet). IMHO it should send a FIN packet like everything else on the planet.

    Long Answer: Using a for loop I started sending requests every i minutes starting a 1 minute. This is what I got at 6 minutes:

    Sleeping for 360000
    Request sent:2020-08-30T16:20:29.467902
    Exception caught:2020-08-30T16:20:29.492103
    java.io.IOException: Connection reset
            at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:576)
            at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119)
            at Main.main(Main.java:207)
    Caused by: java.net.SocketException: Connection reset
            at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345)
            at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376)
            at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1153)
            at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:821)
            at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:175)
            at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
            at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
            at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
            at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:763)
            at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:941)
            at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:245)
            at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:968)
            at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:923)
            at java.base/java.util.ArrayList.forEach(ArrayList.java:1510)
            at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:923)
    java.io.IOException: Connection reset
            at java.net.http/jdk.internal.net.http.HttpClientImpl.send(HttpClientImpl.java:576)
            at java.net.http/jdk.internal.net.http.HttpClientFacade.send(HttpClientFacade.java:119)
            at Main.main(Main.java:207)
    Caused by: java.net.SocketException: Connection reset
            at java.base/sun.nio.ch.SocketChannelImpl.throwConnectionReset(SocketChannelImpl.java:345)
            at java.base/sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:376)
            at java.net.http/jdk.internal.net.http.SocketTube.readAvailable(SocketTube.java:1153)
            at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.read(SocketTube.java:821)
            at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowTask.run(SocketTube.java:175)
            at java.net.http/jdk.internal.net.http.common.SequentialScheduler$SchedulableTask.run(SequentialScheduler.java:198)
            at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:271)
            at java.net.http/jdk.internal.net.http.common.SequentialScheduler.runOrSchedule(SequentialScheduler.java:224)
            at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$InternalReadSubscription.signalReadable(SocketTube.java:763)
            at java.net.http/jdk.internal.net.http.SocketTube$InternalReadPublisher$ReadEvent.signalEvent(SocketTube.java:941)
            at java.net.http/jdk.internal.net.http.SocketTube$SocketFlowEvent.handle(SocketTube.java:245)
            at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.handleEvent(HttpClientImpl.java:968)
            at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.lambda$run$3(HttpClientImpl.java:923)
            at java.base/java.util.ArrayList.forEach(ArrayList.java:1510)
            at java.net.http/jdk.internal.net.http.HttpClientImpl$SelectorManager.run(HttpClientImpl.java:923)
    

    This led me to finding this issue which explains that the problem is caused by the default AWS NAT Gateway behaviour. From the docs:

    Timeout behavior

    When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet).

    Which I can see as the last event in the tcpdump:

    IP 111.11.11.11.https > ip-11-111-11-11.eu-west-1.compute.internal.11111: Flags [R], seq 4144024018, win 637, length 0
    

    Outside of AWS, I get the FIN packet:

    IP 111.11.11.11.https > ip-11-111-11-11.eu-west-1.compute.internal.11111: Flags [F.], seq 5972, ack 7484, win 86, length 0
    

    I tried setting jdk.httpclient.keepalive.timeout based on this article but it does not resolve the issue. Apache it is.

    Here's another related thread dating back to 2016.

    Update

    I got to thinking, Apache is coming out clean because it isn't using pooling out-the-box. I discovered org.apache.http.impl.conn.PoolingHttpClientConnectionManager and set about reproducing the problem. It blows up too:

    Exception in thread "main" javax.net.ssl.SSLException: Connection reset
            at java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:325)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:268)
            at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:263)
            at java.base/sun.security.ssl.SSLSocketImpl.handleException(SSLSocketImpl.java:1654)
            at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1038)
            at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
            at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
            at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:280)
            at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
            at org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
            at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:259)
            at org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:163)
            at org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:157)
            at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:273)
            at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
            at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
            at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:186)
            at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
            at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
            at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
            at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
            at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:108)
            at Main.main(Main.java:226)
            Suppressed: java.net.SocketException: Broken pipe
                    at java.base/sun.nio.ch.NioSocketImpl.implWrite(NioSocketImpl.java:420)
                    at java.base/sun.nio.ch.NioSocketImpl.write(NioSocketImpl.java:440)
                    at java.base/sun.nio.ch.NioSocketImpl$2.write(NioSocketImpl.java:826)
                    at java.base/java.net.Socket$SocketOutputStream.write(Socket.java:1052)
                    at java.base/sun.security.ssl.SSLSocketOutputRecord.encodeAlert(SSLSocketOutputRecord.java:82)
                    at java.base/sun.security.ssl.TransportContext.fatal(TransportContext.java:356)
                    ... 22 more
    Caused by: java.net.SocketException: Connection reset
            at java.base/sun.nio.ch.NioSocketImpl.implRead(NioSocketImpl.java:323)
            at java.base/sun.nio.ch.NioSocketImpl.read(NioSocketImpl.java:350)
            at java.base/sun.nio.ch.NioSocketImpl$1.read(NioSocketImpl.java:803)
            at java.base/java.net.Socket$SocketInputStream.read(Socket.java:982)
            at java.base/sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:457)
            at java.base/sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:68)
            at java.base/sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1411)
            at java.base/sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1022)
            ... 18 more
    

    My conclusion is that the AWS NAT gateway is doing the wrong thing.