common-lispusocket

What is the purpose of the socket-close in this Common Lisp example?


I've found this example from the Common Lisp Cookbook which shows how to start a TCP server with usocket.

The example creates a socket object and establishes a connection, and then writes to the socket. In case of error, the socket write is wrapped in an unwind-protect which will close the socket so it can be reused. I've rewritten the example to cause the error, but when I run this multiple times, I get a USOCKET:ADDRESS-IN-USE-ERROR. The behavior is the same if I remove the socket-close function calls.

(load "~/quicklisp/setup.lisp")
(ql:quickload "usocket")

(let* ((socket (usocket:socket-listen "localhost" 8080))
       (connection (usocket:socket-accept socket)))
        (unwind-protect
          (progn
            (error "error 1"))
          (progn
            (usocket:socket-close connection)
            (usocket:socket-close socket)
            (print "Error clean up"))))
Unhandled USOCKET:ADDRESS-IN-USE-ERROR in thread #<SB-THREAD:THREAD "main thread" RUNNING
                                                    {10005E85B3}>:
  Condition USOCKET:ADDRESS-IN-USE-ERROR was signalled.

Backtrace for: #<SB-THREAD:THREAD "main thread" RUNNING {10005E85B3}>
0: (SB-DEBUG::DEBUGGER-DISABLED-HOOK #<USOCKET:ADDRESS-IN-USE-ERROR {1003FDC063}> #<unused argument> :QUIT T)
1: (SB-DEBUG::RUN-HOOK *INVOKE-DEBUGGER-HOOK* #<USOCKET:ADDRESS-IN-USE-ERROR {1003FDC063}>)
2: (INVOKE-DEBUGGER #<USOCKET:ADDRESS-IN-USE-ERROR {1003FDC063}>)
3: (ERROR #<USOCKET:ADDRESS-IN-USE-ERROR {1003FDC063}>)
4: (USOCKET:SOCKET-LISTEN "localhost" 8080 :REUSEADDRESS NIL :REUSE-ADDRESS NIL :BACKLOG 5 :ELEMENT-TYPE CHARACTER)
5: ((LAMBDA NIL :IN "/home/sam/test/serve.lisp"))
6: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LET* ((SOCKET (USOCKET:SOCKET-LISTEN "localhost" 8080)) (CONNECTION (USOCKET:SOCKET-ACCEPT SOCKET))) (UNWIND-PROTECT (PROGN (ERROR "error 1")) (PROGN (USOCKET:SOCKET-CLOSE CONNECTION) (USOCKET:SOCKET-CLOSE SOCKET) (PRINT "Error clean up")))) #<NULL-LEXENV>)
7: (EVAL-TLF (LET* ((SOCKET (USOCKET:SOCKET-LISTEN "localhost" 8080)) (CONNECTION (USOCKET:SOCKET-ACCEPT SOCKET))) (UNWIND-PROTECT (PROGN (ERROR "error 1")) (PROGN (USOCKET:SOCKET-CLOSE CONNECTION) (USOCKET:SOCKET-CLOSE SOCKET) (PRINT "Error clean up")))) 2 NIL)
8: ((LABELS SB-FASL::EVAL-FORM :IN SB-INT:LOAD-AS-SOURCE) (LET* ((SOCKET (USOCKET:SOCKET-LISTEN "localhost" 8080)) (CONNECTION (USOCKET:SOCKET-ACCEPT SOCKET))) (UNWIND-PROTECT (PROGN (ERROR "error 1")) (PROGN (USOCKET:SOCKET-CLOSE CONNECTION) (USOCKET:SOCKET-CLOSE SOCKET) (PRINT "Error clean up")))) 2)
9: ((LAMBDA (SB-KERNEL:FORM &KEY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN SB-INT:LOAD-AS-SOURCE) (LET* ((SOCKET (USOCKET:SOCKET-LISTEN "localhost" 8080)) (CONNECTION (USOCKET:SOCKET-ACCEPT SOCKET))) (UNWIND-PROTECT (PROGN (ERROR "error 1")) (PROGN (USOCKET:SOCKET-CLOSE CONNECTION) (USOCKET:SOCKET-CLOSE SOCKET) (PRINT "Error clean up")))) :CURRENT-INDEX 2)
10: (SB-C::%DO-FORMS-FROM-INFO #<CLOSURE (LAMBDA (SB-KERNEL:FORM &KEY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN SB-INT:LOAD-AS-SOURCE) {1001B7128B}> #<SB-C::SOURCE-INFO {1001B71243}> SB-C::INPUT-ERROR-IN-LOAD)
11: (SB-INT:LOAD-AS-SOURCE #<SB-SYS:FD-STREAM for "file /home/sam/test/serve.lisp" {1001B66763}> :VERBOSE NIL :PRINT NIL :CONTEXT "loading")
12: ((FLET SB-FASL::THUNK :IN LOAD))
13: (SB-FASL::CALL-WITH-LOAD-BINDINGS #<CLOSURE (FLET SB-FASL::THUNK :IN LOAD) {7FFFF63E769B}> #<SB-SYS:FD-STREAM for "file /home/sam/test/serve.lisp" {1001B66763}>)
14: ((FLET SB-FASL::LOAD-STREAM :IN LOAD) #<SB-SYS:FD-STREAM for "file /home/sam/test/serve.lisp" {1001B66763}> NIL)
15: (LOAD #<SB-SYS:FD-STREAM for "file /home/sam/test/serve.lisp" {1001B66763}> :VERBOSE NIL :PRINT NIL :IF-DOES-NOT-EXIST T :EXTERNAL-FORMAT :DEFAULT)
16: ((FLET SB-IMPL::LOAD-SCRIPT :IN SB-IMPL::PROCESS-SCRIPT) #<SB-SYS:FD-STREAM for "file /home/sam/test/serve.lisp" {1001B66763}>)
17: ((FLET SB-UNIX::BODY :IN SB-IMPL::PROCESS-SCRIPT))
18: ((FLET "WITHOUT-INTERRUPTS-BODY-3" :IN SB-IMPL::PROCESS-SCRIPT))
19: (SB-IMPL::PROCESS-SCRIPT "serve.lisp")
20: (SB-IMPL::TOPLEVEL-INIT)
21: ((FLET SB-UNIX::BODY :IN SAVE-LISP-AND-DIE))
22: ((FLET "WITHOUT-INTERRUPTS-BODY-36" :IN SAVE-LISP-AND-DIE))
23: ((LABELS SB-IMPL::RESTART-LISP :IN SAVE-LISP-AND-DIE))

unhandled condition in --disable-debugger mode, quitting

Solution

  • The reason you get this is because of the nature of the TCP protocol: the connection is in a state called TIME-WAIT in the TCP state machine described by RFC793. A diagram of the state machine is on page 23 of RFC793.

    The interesting bit of the state machine is when one end (who I'll call 'you') wants to close the connection – this is known as an 'active close', and in this case it's what you are initiating by the socket-close calls. I'll call the other end 'them'. The normal sequence of events for an active close is:

    1. you send them a FIN packet;
    2. they ACK your FIN and send in turn a FIN;
    3. you ACK their FIN.

    Now it's important to remember that any of these packets (their ACK and FIN are the same packet usually and I think always) can get lost, and the state machine needs to recover from this.

    There's one particularly interesting packet which is the last ACK: it's particularly interesting because it's the last packet ever sent, which means that you have no way of knowing whether it reached them.

    So consider the situation from both ends

    From their end: they've sent a FIN and are waiting for your ACK for it. Now:

    From your end: you've got their FIN and have sent the last ACK. But you have no idea whether that ACK ever got to them. So you wait for a prescribed time in order to give them a chance to realise that the ACK did not get their and resend their FIN. During this wait you can't dismantle the connection, because at any point you may get another FIN.

    This waiting state is known as TIME-WAIT, and during it the endpoint of the connection can't be reused. And this is the problem you are seeing.

    You need to sit in TIME-WAIT for twice the maximum segment lifetime (MSL): the MSL is how long a packet can be expected to sit in the network.

    There are other waiting-states which can occur prior to TIME-WAIT of course, if earlier packets get lost. But TIME-WAIT is the only one which always occurs.

    TIME-WAIT is often called TIME_WAIT due to languages which can't handle hyphens in names.