When trying to establish a largeish number of TCP connections in parallel I observe some weird behavior I consider a potential bug in gen_tcp
.
The scenario is a server listening on a port with multiple concurrent acceptors. From a client I establish a connection by calling gen_tcp:connect/3
, afterwards I send a "Ping" message to the server and wait in passive mode for a "Pong" response. When performing the 'get_tcp:connect/3' calls sequentially all works fine, including for large number of connections (I tested up to ~ 28000).
The problem occurs when trying to establish a lot of connections in parallel (depending on the machine between ~75 and several hundred). While most of the connections still get established, some connections fail with a closed
error in gen_tcp:recv/3
. The weird thing is, that these connections did not fail before, the calls to gen_tcp:connect/3
and gen_tcp:send/2
were both successful
(i.e. returned ok
). On the server side I don't see a matching connection for these "weird" connections, i.e. no returning gen_tcp:accept/1
. It is my understanding, that a successful 'get_tcp:connect/3' should result in a matching accepted connection at the server side.
I already filed a bug report, there you can find a more detailed description and a minimal code example to demonstrate the problem. I was able to reproduce the problem on Linux and Mac OS X and with different Erlang versions.
My questions here are:
TCP 3-way handshake Client Server
connect()│──┐ │listen()
│ └──┐ │
│ SYN │
│ └──┐ │
│ └▶│ STATE
│ ┌──│SYN-RECEIVED
│ ┌──┘ │
│ SYN-ACK │
│ ┌──┘ │
STATE │◀┘ │
ESTABLISHED│──┐ │
│ └──┐ │
│ └ACK │
│ └──┐ │ STATE
│ └▶│ESTABLISHED
▽ ▽
The problem lies with the finer details of the 3-way handshake for establishing a TCP connection and the queue for incoming connections at the listen socket. See this excellent article for details, much of the following explanation was informed by this article.
In Linux there are actually two queues for incoming connections. When the server receives a connection request (SYN
packet) and transitions to the state SYN-RECEIVED
, this connection is placed in the SYN
queue. If a corresponding ACK
is received, the connections is placed in the accept queue for the application to consume. The {backlog, N}
(default: 5) option to gen_tcp:listen/2
determines the length of the access queue.
When the server receives an ACK
while the accept queue is full the ACK
is basically ignored and no RST
is sent to the client. There is a timeout associated with the SYN-RECEIVED
state: if no ACK
is received (or ignored, as is the case here), the server will resend the SYN-ACK
. The client then resends the ACK
. If the application consumes an entry from accept queue before the maximum number of SYN-ACK
retries has been reached, the server will eventually process one of the duplicate ACKs
and transition to state ESTABLISHED
. If the maximum number of retries has been reached the server will send a RST
to the the client to reset the connection.
Coming back to the behavior observed when starting lots of connections in parallel. The explanation is, that the accept queue at the server fills up faster than our application consumes the accepted connections. The gen_tcp:connect/3
calls on the client side return successfully as soon as the receive the first SYN-ACK
. The connections do not get reset immediately because the server retries the SYN-ACK
. The server does not report these connections as successful, because they are still in state SYN-RECEIVED
.
On BSD derived system (including Mac OS X) the queue for incoming connections works a bit different, see the above mentioned article.