I am working on a project in which lots of systems are connected to one another. To test my software, I am simulating an environment using TCL in which all these systems are connected to mine. These systems are all TCP servers and my software uses TCP connections as a client. These tests are run under Linux and Windows (through Cygwin environment).
Everything was working perfectly fine until I increased the number of systems. One of my servers was constantly waking up as if data were received, but when reading the socket, there was no data. The socket was still connected (so it was not closed). This problem is only visible under Windows and not under Linux.
Installed versions:
To highlight this issue, I wrote a very simplified version of the test design in which I connect N TCP client sockets to N TCP server sockets. Each server is owned by a TCL thread. So, apart from the main thread, N threads are created. The full script is shown below, where N corresponds to the variable nbSockets:
set nbSockets 19
set portNbStart 20000
set ::auto_path [linsert $::auto_path end "Packages/Thread/Windows"]
package require Thread
# Variables used to wait for the receiver thread to be started
tsv::set VARIABLES COND [thread::cond create]
tsv::set VARIABLES MUTEX [thread::mutex create]
# Create all receiver threads and their TCP socket server
for {set i 0} {$i < $nbSockets} {incr i} {
# Lock mutex that will be used to check the readiness of the receiver thread
thread::mutex lock [tsv::get VARIABLES MUTEX]
set portNb [expr $portNbStart + $i]
tsv::set VARIABLES PORT_NB $portNb
tsv::set VARIABLES THREAD_READY 0
set threadIds($i) [thread::create {
# Initialize variables
set ::CLIENT_SOCKET ""
set ::SERVER_SOCKET ""
set ::PORT_NB [tsv::get VARIABLES PORT_NB]
# Actions to perform when a message is received from the client
proc handleMessage {chan} {
set data [chan read $chan 4]
if {[eof $chan]} {
puts "On port Nb $::PORT_NB, socket is closed"
} else {
puts "On port Nb $::PORT_NB, data size = [string length $data]"
}
}
# Configuration of the client socket when a new connection is received
proc accept {chan addr port} {
set ::CLIENT_SOCKET $chan
chan configure $chan -translation binary -buffering none -blocking 0
chan event $chan readable [list handleMessage $chan]
}
# Actions to perform on disconnection
proc Disconnect {} {
# Close server socket
CloseServerSocket
# Close client socket
CloseClientSocket
}
# Only close server socket
proc CloseServerSocket {} {
close $::SERVER_SOCKET
set ::SERVER_SOCKET ""
}
# Only close client socket
proc CloseClientSocket {} {
fileevent $::CLIENT_SOCKET readable {}
close $::CLIENT_SOCKET
set ::CLIENT_SOCKET ""
}
# Accept incoming connections
set ::SERVER_SOCKET [socket -server accept -myaddr 127.0.0.1 $::PORT_NB]
# Inform the main thread that this one is ready to listen
tsv::set VARIABLES THREAD_READY 1
thread::mutex lock [tsv::get VARIABLES MUTEX]
thread::cond notify [tsv::get VARIABLES COND]
thread::mutex unlock [tsv::get VARIABLES MUTEX]
# Enter the event loop
thread::wait
}]
# Wait for the thread to be ready
thread::cond wait [tsv::get VARIABLES COND] [tsv::get VARIABLES MUTEX] 10000; # Wait at most 10 seconds
thread::mutex unlock [tsv::get VARIABLES MUTEX]
if {[tsv::get VARIABLES THREAD_READY] != 1} {
error "Could not launch TCP server thread on port number $portNb"
}
}
# Connect to the receiver threads and send 4 bytes
for {set i 0} {$i < $nbSockets} {incr i} {
set portNb [expr $portNbStart + $i]
puts "Sending data at port number $portNb"
# Connect to the server
set Sockets($i) [socket -myaddr 127.0.0.1 127.0.0.1 $portNb]
# Wait a little
after 200
# Send a simple message consisting of a 4 bytes integer representing the port number
puts -nonewline $Sockets($i) [binary format I $portNb]
flush $Sockets($i)
# Wait a little again
after 200
}
after 2000
# Cleaning...
puts "Closing..."
for {set i 0} {$i < $nbSockets} {incr i} {
thread::send $threadIds($i) {
Disconnect
}
close $Sockets($i)
thread::release $threadIds($i)
}
The N sockets servers listen on ports ranging from 20000 to 20000 + N - 1.
When I run the script with nbSockets = 19, I obtain (I just show a few lines):
Sending data at port number 20000
On port Nb 20000, data size = 4
Sending data at port number 20001
On port Nb 20001, data size = 4
[...]
Sending data at port number 20018
On port Nb 20018, data size = 4
Closing...
When I run the same script with nbSockets = 20, I obtain:
Sending data at port number 20000
On port Nb 20000, data size = 4
Sending data at port number 20001
On port Nb 20001, data size = 4
[...]
Sending data at port number 20018
On port Nb 20018, data size = 4
Sending data at port number 20019
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
[...] The same line is repeated more than 9000 times
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
Closing...
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
So my questions are: is there something wrong with my test design? (The same script run under Linux does not have this issue). Is there a known limitation with TCL running under Cygwin environment? Is there a TCL parameter that I don't know about that I should increase? I tried to look for an answer on the web but didn't find anything relevant.
Thanks in advance!
After installing MagicSplat TCL distribution (version 1.16.0) which does not use Cygwin, this problem is not visible anymore. So I tend to think that this issue was related to Cygwin even though I can't explain how.