I have a server that is running a select()
loop that sometimes continues blocking when the client closes the connection from its side. The select()
loop handles all other read/write operations correctly and sets the correct file descriptor in the fd_set
, leading me to believe that it is not an issue with the file descriptor setup on the server-side.
The way I planned on handling the client closing the connection was to have the select()
break due to activity on the socket (closing it from the client-side), see that the fd was set for that socket, and then try to read from it - and if the read returned 0, then close the connection. However, because the select()
doesn't always return when the client side closes the connection, there is no attempt to check the fd_set
and subsequently try to read from the socket.
As a workaround, I implemented a "stop code" that the client writes to the server just before closing the connection, and this write causes the select()
to break and the server reads the "stop code" and knows to close the socket. The only problem with this solution is the "stop code" is an arbitrary string of bytes that could potentially appear in regular traffic, as the normal data being written can contain random strings that could potentially contain the "stop code". Is there a better way to handle the client closing the connection from its end? Or is the method I described the general "best practice"?
I think my issue has something to do with OpenSSL, as the connection in question is an OpenSSL tunnel, and it is the only file descriptor in the set giving me issues.
The way I planned on handling the client closing the connection was to have the
select()
break due to activity on the socket (closing it from the client-side), see that thefd
was set for that socket, and then try to read from it - and if the read returned 0, then close the connection. However, because theselect()
doesn't always return when the client side closes the connection, there is no attempt to check thefd_set
and subsequently try to read from the socket.
Regardless of whether you are using SSL or not, select()
can tell you when the socket is readable (has data available to read), and a graceful closure is a readable condition (a subsequent read operation reports 0 bytes read). It is only abnormal disconnects that select()
can't report (unless you use the exceptfds
parameter, but even that is not always guaranteed). The best way to handle abnormal disconnects is to simply use timeouts in your own code. If you don't receive data from the client for awhile, just close the connection. The client will have to send data periodically, such as a small heartbeat command, if it wants to stay connected.
Also, when using OpenSSL, if you are using the older ssl_...
API functions (ssl_new()
, ssl_set_fd()
, ssl_read()
, ssl_write()
, etc), make sure you are NOT just blindly calling select()
whenever you want, that you call it ONLY when OpenSSL tells you to (when an SSL read/write operation reports an SSL_ERROR_WANT_(READ|WRITE)
error). This is an area where alot of OpenSSL newbies tend to make the same mistake. They try to use OpenSSL on top of pre-existing socket logic that waits for a readable notification before then reading data. This is the wrong way to use the ssl_...
API. You are expected to ask OpenSSL to perform a read/write operation unconditionally, and then if it needs to wait for new data to arrive, or pending data to send out, it will tell you and you can then call select()
accordingly before retrying the SSL read/write operation again.
On the other hand, if you are using the newer bio_...
API functions (bio_new()
, bio_read()
, bio_write()
, etc), you can take control of the underlying socket I/O and not let OpenSSL manage it for you, thus you can do whatever you want with select()
(or any other socket API you want).
As a workaround, I implemented a "stop code" that the client writes to the server just before closing the connection, and this write causes the
select()
to break and the server reads the "stop code" and knows to close the socket.
That is a very common approach in many Internet protocols, regardless of whether SSL is used or not. It is a very distinct and explicit way for the client to say "I'm done" and both parties can then close their respective sockets.
The only problem with this solution is the "stop code" is an arbitrary string of bytes that could potentially appear in regular traffic, as the normal data being written can contain random strings that could potentially contain the "stop code".
Then either your communication protocol is not designed properly, or your code is not processing the protocol correctly. In a properly-designed and correctly-processed protocol, there will not be any such ambiguity. There needs to be a clear distinction between the various commands that your protocol defines. Your "stop code" would be one such command amongst other commands. Random data in one command should not be mistakenly treated as a different command. If you are experiencing that problem, you need to fix it.