multithreadingtcptclcygwin

Issue while receiving TCP data when 20 TCP sockets are connected in TCL running under Cygwin environment


I am working on a project in which lots of systems are connected to one another. To test my software, I am simulating an environment using TCL in which all these systems are connected to mine. These systems are all TCP servers and my software uses TCP connections as a client. These tests are run under Linux and Windows (through Cygwin environment).

Everything was working perfectly fine until I increased the number of systems. One of my servers was constantly waking up as if data were received, but when reading the socket, there was no data. The socket was still connected (so it was not closed). This problem is only visible under Windows and not under Linux.

Installed versions:

To highlight this issue, I wrote a very simplified version of the test design in which I connect N TCP client sockets to N TCP server sockets. Each server is owned by a TCL thread. So, apart from the main thread, N threads are created. The full script is shown below, where N corresponds to the variable nbSockets:

set nbSockets 19
set portNbStart 20000

set ::auto_path [linsert $::auto_path end "Packages/Thread/Windows"]
package require Thread

# Variables used to wait for the receiver thread to be started 
tsv::set VARIABLES COND [thread::cond create]
tsv::set VARIABLES MUTEX [thread::mutex create]

# Create all receiver threads and their TCP socket server
for {set i 0} {$i < $nbSockets} {incr i} {

    # Lock mutex that will be used to check the readiness of the receiver thread
    thread::mutex lock [tsv::get VARIABLES MUTEX]
    
    set portNb [expr $portNbStart + $i]
    tsv::set VARIABLES PORT_NB $portNb
    tsv::set VARIABLES THREAD_READY 0

    set threadIds($i) [thread::create {
    
        # Initialize variables
        set ::CLIENT_SOCKET ""
        set ::SERVER_SOCKET ""
        set ::PORT_NB [tsv::get VARIABLES PORT_NB]
        
        # Actions to perform when a message is received from the client
        proc handleMessage {chan} {
            set data [chan read $chan 4]
            
            if {[eof $chan]} {
                puts "On port Nb $::PORT_NB, socket is closed"
            } else {
                puts "On port Nb $::PORT_NB, data size = [string length $data]"
            }
        }
        
        # Configuration of the client socket when a new connection is received
        proc accept {chan addr port} {
            set ::CLIENT_SOCKET $chan
            chan configure $chan -translation binary -buffering none -blocking 0
            chan event $chan readable [list handleMessage $chan]
        }
        
        # Actions to perform on disconnection
        proc Disconnect {} {
            # Close server socket
            CloseServerSocket

            # Close client socket
            CloseClientSocket
        }
        
        # Only close server socket
        proc CloseServerSocket {} {
            close $::SERVER_SOCKET
            set ::SERVER_SOCKET ""
        }
        
        # Only close client socket
        proc CloseClientSocket {} {
            fileevent $::CLIENT_SOCKET readable {}
            close $::CLIENT_SOCKET
            set ::CLIENT_SOCKET ""
        }

        # Accept incoming connections
        set ::SERVER_SOCKET [socket -server accept -myaddr 127.0.0.1 $::PORT_NB]
        
        # Inform the main thread that this one is ready to listen
        tsv::set VARIABLES THREAD_READY 1
        thread::mutex lock [tsv::get VARIABLES MUTEX]
        thread::cond notify [tsv::get VARIABLES COND]
        thread::mutex unlock [tsv::get VARIABLES MUTEX]
        
        # Enter the event loop
        thread::wait
    }]
    
    # Wait for the thread to be ready
    thread::cond wait [tsv::get VARIABLES COND] [tsv::get VARIABLES MUTEX] 10000; # Wait at most 10 seconds
    thread::mutex unlock [tsv::get VARIABLES MUTEX]
    
    if {[tsv::get VARIABLES THREAD_READY] != 1} {
        error "Could not launch TCP server thread on port number $portNb"
    }
}

# Connect to the receiver threads and send 4 bytes
for {set i 0} {$i < $nbSockets} {incr i} {
    set portNb [expr $portNbStart + $i]

    puts "Sending data at port number $portNb"

    # Connect to the server
    set Sockets($i) [socket -myaddr 127.0.0.1 127.0.0.1 $portNb]
    
    # Wait a little
    after 200
    
    # Send a simple message consisting of a 4 bytes integer representing the port number
    puts -nonewline $Sockets($i) [binary format I $portNb]
    flush $Sockets($i)
    
    # Wait a little again
    after 200
}

after 2000

# Cleaning...
puts "Closing..."
for {set i 0} {$i < $nbSockets} {incr i} {
    thread::send $threadIds($i) {
        Disconnect
    }
    close $Sockets($i)
    
    thread::release $threadIds($i)
}

The N sockets servers listen on ports ranging from 20000 to 20000 + N - 1.

When I run the script with nbSockets = 19, I obtain (I just show a few lines):

Sending data at port number 20000
On port Nb 20000, data size = 4
Sending data at port number 20001
On port Nb 20001, data size = 4
[...]
Sending data at port number 20018
On port Nb 20018, data size = 4
Closing...

When I run the same script with nbSockets = 20, I obtain:

Sending data at port number 20000
On port Nb 20000, data size = 4
Sending data at port number 20001
On port Nb 20001, data size = 4
[...]
Sending data at port number 20018
On port Nb 20018, data size = 4
Sending data at port number 20019
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
[...] The same line is repeated more than 9000 times
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
Closing...
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0
On port Nb 20019, data size = 0

So my questions are: is there something wrong with my test design? (The same script run under Linux does not have this issue). Is there a known limitation with TCL running under Cygwin environment? Is there a TCL parameter that I don't know about that I should increase? I tried to look for an answer on the web but didn't find anything relevant.

Thanks in advance!


Solution

  • After installing MagicSplat TCL distribution (version 1.16.0) which does not use Cygwin, this problem is not visible anymore. So I tend to think that this issue was related to Cygwin even though I can't explain how.