I have been trying to figure out a case where a TCP connection between a HTTP client and HTTP server remains in ESTABLISHED state, lingering. This happens for 1 or 2 connections out of 1000+ connections. It is not clear if the client / server is at fault here.
I wrote a python script (using scapy) to capture all the TCP packets to figure out the root cause and I faced this specific case, where the TCP SEQ and ACK seems to be a mismatch and it is confusing me.
Here is the interesting part of the log, from the scapy script: (after lots of packets on the same port 53332)
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769374665 ack:844297577 len:0
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769374665 ack:844297577 len:90
HTTP/1.1 200 OK
content-type: application/json; charset=UTF-8
content-length: 389255
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769374755 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769383704 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769392653 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.3:53332 -> 10.0.1.2:8080 [ A] seq:844297577 ack:769383704 len:0
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769401602 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769410551 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769419500 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.3:53332 -> 10.0.1.2:8080 [ A] seq:844297577 ack:769401602 len:0
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769428449 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769437398 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769446347 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769455296 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769464245 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769473194 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.3:53332 -> 10.0.1.2:8080 [ A] seq:844297577 ack:769446347 len:0
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769482143 ack:844297577 len:8949
2019-12-21 15:54:43 10.0.1.2:8080 -> 10.0.1.3:53332 [PA] seq:769491092 ack:844297577 len:8949
... scapy script must have missed several packets here ...
2019-12-21 15:54:43 10.0.1.3:53332 -> 10.0.1.2:8080 [ A] seq:844297577 ack:769750613 len:0
2019-12-21 15:54:43 10.0.1.3:53332 -> 10.0.1.2:8080 [ A] seq:844297577 ack:769764010 len:0
After a couple of hours:
2019-12-21 17:54:45 10.0.1.2:8080 -> 10.0.1.3:53332 [ A] seq:769764009 ack:844297577 len:0
2019-12-21 17:54:45 10.0.1.3:53332 -> 10.0.1.2:8080 [ A] seq:844297577 ack:769764010 len:0
At 15:54:43, the client has responded with ACK of 769764010, indicating it has received data upto 769764010. After 2 hours, the server is sending a SEQ of 769764009, which is 1 less than the ACK. And the client has continued to send the ACK of 769764010.
I am perplexed as how can SEQ be less than ACK (or how can ACK be greater than SEQ). I have verified that on both systems, the connection is still at ESTABLISHED state, so neither has sent a FIN, to have caused the increase the seq numbering.
What am I missing?
This is actually @user207421 answer, but the user chose to comment, so I am writing this answer.
There was no problem in the first place. It was TCP Keepalive packets and all TCP keep-alive packets are simply an ACK with the sequence number set to one less than the current sequence number for the connection.
Hence there was really no mismatch.