csocketstimeoutnonblocking

How to get a timeout to work when connecting to a socket


I'm trying to supply a timeout for connect(), I get no error reported from getsockopt(). But then when I come to write(), it fails with an "errno of 107 - ENOTCONN".

I'm running on Fedora 23. The docs for connect() says it should return failure with an errno of EINPROGRESS for a connect that is not complete yet, however I was experiencing EAGAIN, so I added that to my check.

Currently my socket server is setting the backlog to zero in the listen() call. Many of the calls succeed, but the ones that fail all fail with the 107 - ENOTCONN I had mentioned in the write() call.

int domain_socket_send(const char* socket_name, unsigned char* buffer,
        unsigned int length, unsigned int timeout)
{
    struct sockaddr_un addr;
    int fd = -1;
    int result = 0;

    // Create socket.
    
    fd = socket(AF_UNIX, SOCK_STREAM, 0);
    if (fd == -1)
        {
        result = -1;
        goto done;
        }

    if (timeout != 0)
        {
        
        // Enabled non-blocking.

        int flags;
        flags = fcntl(fd, F_GETFL);
        fcntl(fd, F_SETFL, flags | O_NONBLOCK);
        }

    // Set socket name.
    
    memset(&addr, 0, sizeof(addr));
    addr.sun_family = AF_UNIX;
    strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path) - 1);

    // Connect.
    
    result = connect(fd, (struct sockaddr*) &addr, sizeof(addr));
    if (result == -1)
        {

        // If some error then we're done.
        
        if ((errno != EINPROGRESS) && (errno != EAGAIN))
            goto done;

        fd_set write_set;
        struct timeval tv;

        // Set timeout.
        
        tv.tv_sec = timeout / 1000000;
        tv.tv_usec = timeout % 1000000;

        unsigned int iterations = 0;
        while (1)
            {
            FD_ZERO(&write_set);
            FD_SET(fd, &write_set);

            result = select(fd + 1, NULL, &write_set, NULL, &tv);
            if (result == -1)
                goto done;
            else if (result == 0)
                {
                result = -1;
                errno = ETIMEDOUT;
                goto done;
                }
            else
                {
                if (FD_ISSET(fd, &write_set))
                    {
                    socklen_t len;
                    int socket_error;
                    len = sizeof(socket_error);
            
                    // Get the result of the connect() call.
            
                    result = getsockopt(fd, SOL_SOCKET, SO_ERROR,
                            &socket_error, &len);
                    if (result == -1)
                        goto done;

                    // I think SO_ERROR will be zero for a successful
                    // result and errno otherwise.

                    if (socket_error != 0)
                        {
                        result = -1;
                        errno = socket_error;
                        goto done;
                        }

                    // Now that the socket is writable issue another connect.
            
                    result = connect(fd, (struct sockaddr*) &addr,
                            sizeof(addr));
                    if (result == 0)
                        {
                        if (iterations > 1)
                            {
                            printf("connect() succeeded on iteration %d\n",
                                    iterations);
                            }
                        break;
                        }
                    else
                        {
                        if ((errno != EAGAIN) && (errno != EINPROGRESS))
                            {
                            int err = errno;
                            printf("second connect() failed, errno = %d\n",
                                    errno);
                            errno = err;
                            goto done;
                            }
                        iterations++;
                        }
                    }
                }
            }
        }

    // If we put the socket in non-blocking mode then put it back
    // to blocking mode.

    if (timeout != 0)
        {
        
        // Turn off non-blocking.

        int flags;
        flags = fcntl(fd, F_GETFL);
        fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
        }

    // Write buffer.

    result = write(fd, buffer, length);
    if (result == -1)
        {
        int err = errno;
        printf("write() failed, errno = %d\n", err);
        errno = err;
        goto done;
        }

done:
    if (result == -1)
        result = errno;
    else
        result = 0;
    if (fd != -1)
        {
        shutdown(fd, SHUT_RDWR);
        close(fd);
        }
    return result;
}

It dawned on me that maybe I need to call connect() multiple times until successful, after all this is non-blocking I/O not async I/O. Just like I have to call read() again when there is data to read after encountering an EAGAIN on a read(). In addition, I found the following SO question: Using select() for non-blocking sockets to connect always returns 1, in which EJP's answer says you need to issue multiple connect()'s.

Also, from the book EJP references, it seems to indicate you need to issue multiple connect()'s.

I've modified the code snippet in this question to call connect() until it succeeds. I probably still need to make changes around, possibly updating the timeout value passed to select(), but that's not my immediate question.

Calling connect() multiple times appears to have fixed my original problem, which was that I was getting ENOTCONN when calling write(), I guess because the socket was not connected.

However, you can see from the code that I'm tracking how many times through the select loop until connect() succeeds. I've seen the number go into the thousands. This gets me worried that I'm in a busy wait loop.

Why is the socket writable even though it's not in a state that connect() will succeed? Is calling connect() clearing that writable state, and it's getting set again by the OS for some reason, or am I really in a busy wait loop?


Solution

  • From http://lxr.free-electrons.com/source/net/unix/af_unix.c:

    441 static int unix_writable(const struct sock *sk)
    442 {
    443         return sk->sk_state != TCP_LISTEN &&
    444                (atomic_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf;
    445 }
    

    I'm not sure what these buffers are that are being compared, but it looks obvious that the connected state of the socket is not being checked. So unless these buffers are modified when the socket becomes connected it would appear my unix socket will always be marked as writable and thus I can't use select() to determine when the non-blocking connect() has finished.

    and based on this snippet from http://lxr.free-electrons.com/source/net/unix/af_unix.c:

    1206 static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
    1207                                int addr_len, int flags)
    .
    .
    .
    1230         timeo = sock_sndtimeo(sk, flags & O_NONBLOCK);
    .
    .
    .
    1271         if (unix_recvq_full(other)) {
    1272                 err = -EAGAIN;
    1273                 if (!timeo)
    1274                         goto out_unlock;
    1275 
    1276                 timeo = unix_wait_for_peer(other, timeo);
    .
    .
    .
    

    it appears setting the send timeout might be capable of timing out the connect. Which also matches the documentation for SO_SNDTIMEO at http://man7.org/linux/man-pages/man7/socket.7.html.

    Thanks, Nick