I am trying to understand the TCP reset problem mentioned in RFC 7230: HTTP/1.1 Message Syntax and Routing, § 6.6:
6.6. Tear-down
The Connection header field (Section 6.1) provides a "close" connection option that a sender SHOULD send when it wishes to close the connection after the current request/response pair.
So HTTP/1.1 has persistent connections, meaning that multiple HTTP request/response pairs can be sent on the same connection.
A client that sends a "close" connection option MUST NOT send further requests on that connection (after the one containing "close") and MUST close the connection after reading the final response message corresponding to this request.
A server that receives a "close" connection option MUST initiate a close of the connection (see below) after it sends the final response to the request that contained "close". The server SHOULD send a "close" connection option in its final response on that connection. The server MUST NOT process any further requests received on that connection.
So the client signals that it will close the connection by adding the Connection: close
header field to the last HTTP request, and it closes the connection only after it receives the HTTP response acknowledging that the server received the request.
A server that sends a "close" connection option MUST initiate a close of the connection (see below) after it sends the response containing "close". The server MUST NOT process any further requests received on that connection.
A client that receives a "close" connection option MUST cease sending requests on that connection and close the connection after reading the response message containing the "close"; if additional pipelined requests had been sent on the connection, the client SHOULD NOT assume that they will be processed by the server.
So the server signals that it will close the connection by adding the Connection: close
header field to the last HTTP response, and it closes the connection. But it closes the connection only after receiving which message acknowledging that the client received the HTTP response?
If a server performs an immediate close of a TCP connection, there is a significant risk that the client will not be able to read the last HTTP response. If the server receives additional data from the client on a fully closed connection, such as another request that was sent by the client before receiving the server's response, the server's TCP stack will send a reset packet to the client; unfortunately, the reset packet might erase the client's unacknowledged input buffers before they can be read and interpreted by the client's HTTP parser.
So in the case where the server initiates the close of the connection, if the server fully closes the connection right after sending the HTTP response with a Connection: close
header field to an initial HTTP request, then the client may not receive that HTTP response because it received a TCP reset packet response to a subsequent HTTP request that it sent after the initial HTTP request. But how can the TCP reset packet response to the subsequent HTTP request precede the HTTP response to the initial HTTP request?
To avoid the TCP reset problem, servers typically close a connection in stages. First, the server performs a half-close by closing only the write side of the read/write connection. The server then continues to read from the connection until it receives a corresponding close by the client, or until the server is reasonably certain that its own TCP stack has received the client's acknowledgement of the packet(s) containing the server's last response. Finally, the server fully closes the connection.
So in the case where the server initiates the close of the connection, the server only closes the write side of the connection right after sending the HTTP response with a Connection: close
header field to an initial HTTP request, and it closes the read side of the connection only after receiving a subsequent corresponding HTTP request with a Connection: close
header field or after waiting for a period long enough to assume that it received a TCP message acknowledging that the client received the HTTP response. But why would the client send a subsequent corresponding HTTP request with a Connection: close
header field after receiving the HTTP response with a Connection: close
header field, whereas paragraph 5 states: ‘A client that receives a "close" connection option MUST cease sending requests on that connection’?
It is unknown whether the reset problem is exclusive to TCP or might also be found in other transport connection protocols.
But why would the client send a subsequent corresponding HTTP request with a Connection: close header field after receiving the HTTP response with a Connection: close header field, whereas paragraph 5 states: ‘A client that receives a "close" connection option MUST cease sending requests on that connection’?
With HTTP pipelining the client can send new requests even though the response for a previous request (and thus the Connection: close
in this response) was not yet received. This is a slight optimization from only sending the next request after the response for the previous one was received, but it comes with the risk that this new request will not be processed by the server.
But how can the TCP reset packet response to the subsequent HTTP request precede the HTTP response to the initial HTTP request?
While the TCP RST will be send after the response it will be propagated early to the application. A TCP RST is sent if new data arrive at a socket which is already shut down for at least reading (i.e. close(fd)
or shutdown(fd, SHUT_RD)
). It will also be sent if there are still unprocessed data in the receive buffer of the socket on shutdown, i.e. like in the case of HTTP pipelining. Once a TCP RST is received by the peer, its socket will be marked as broken. On the next system call with this socket (i.e. typically a read
or write
) this error then will be delivered to the application—no matter if there would be still unread data in the receive buffer of the socket. These unread data are thus lost.
But it closes the connection only after receiving which message acknowledging that the client received the HTTP response?
It is not waiting for some application message from the client. It will first deliver the response with the Connection: close
, then read on the socket in order to determine the close of the connection by the client. Then it will also close the connection. This waiting for close should of course be done with a short timeout, because disrupted connections might cause connections to never be explicitly closed. Alternatively it could just wait some seconds and hope that the client got and processed the response in the mean time.