Consider the following client and server components:
import java.io.InputStream;
import java.io.IOException;
import java.net.HttpURLConnection;
import java.net.URL;
public class client {
public static void main(String[] args) throws IOException {
while (true) {
URL url = new URL("http://localhost:8000");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
int statusCode = connection.getResponseCode();
System.out.println("Status Code: " + statusCode);
connection.disconnect();
}
}
}
import java.io.OutputStream;
import java.io.IOException;
import java.net.ServerSocket;
import java.net.Socket;
public class server {
public static void main(String[] args) throws IOException {
ServerSocket serverSocket = new ServerSocket(8000);
while (true) {
Socket clientSocket = serverSocket.accept();
OutputStream outputStream = clientSocket.getOutputStream();
outputStream.write("HTTP/1.1 200 OK\r\nContent-Length: 0\r\n\r\n".getBytes());
outputStream.flush();
clientSocket.close();
}
}
}
Upon running the client while the server is running, you will soon see that the client begins to hang on SYN_SENT
on the TCP-level (for about 30 seconds in all):
$ watch -n 0.1 "ss -on state syn-sent '( dport = :8000 )'"
Every 0.1s: ss -on state syn-sent '( dport = :8000 )' myhost: Tue Jul 16 04:08:52 2024
Netid Recv-Q Send-Q Local Address:Port Peer Address:Port Process
tcp 0 1 [::ffff:127.0.0.1]:60418 [::ffff:127.0.0.1]:8000 timer:(on,3.731ms,2)
$ pkill -3 java
# Stack trace of client's main thread while hanging outputted in Java terminal...
"main" #1 prio=5 os_prio=0 cpu=2429.68ms elapsed=40.96s tid=0x000079e6c40266c0 nid=0x18a1c6 runnable [0x000079e6cb9fd000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.Net.connect0(java.base@17.0.11/Native Method)
at sun.nio.ch.Net.connect(java.base@17.0.11/Net.java:579)
at sun.nio.ch.Net.connect(java.base@17.0.11/Net.java:568)
at sun.nio.ch.NioSocketImpl.connect(java.base@17.0.11/NioSocketImpl.java:593)
at java.net.Socket.connect(java.base@17.0.11/Socket.java:633)
at java.net.Socket.connect(java.base@17.0.11/Socket.java:583)
at sun.net.NetworkClient.doConnect(java.base@17.0.11/NetworkClient.java:183)
at sun.net.www.http.HttpClient.openServer(java.base@17.0.11/HttpClient.java:533)
at sun.net.www.http.HttpClient.openServer(java.base@17.0.11/HttpClient.java:638)
at sun.net.www.http.HttpClient.<init>(java.base@17.0.11/HttpClient.java:281)
at sun.net.www.http.HttpClient.New(java.base@17.0.11/HttpClient.java:386)
at sun.net.www.http.HttpClient.New(java.base@17.0.11/HttpClient.java:422)
at sun.net.www.protocol.http.HttpURLConnection.setNewClient(java.base@17.0.11/HttpURLConnection.java:831)
at sun.net.www.protocol.http.HttpURLConnection.setNewClient(java.base@17.0.11/HttpURLConnection.java:819)
at sun.net.www.protocol.http.HttpURLConnection.writeRequests(java.base@17.0.11/HttpURLConnection.java:759)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(java.base@17.0.11/HttpURLConnection.java:1708)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(java.base@17.0.11/HttpURLConnection.java:1611)
at java.net.HttpURLConnection.getResponseCode(java.base@17.0.11/HttpURLConnection.java:529)
at client.main(client.java:13)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@17.0.11/Native Method)
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@17.0.11/NativeMethodAccessorImpl.java:77)
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.11/DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(java.base@17.0.11/Method.java:568)
at com.sun.tools.javac.launcher.Main.execute(jdk.compiler@17.0.11/Main.java:419)
at com.sun.tools.javac.launcher.Main.run(jdk.compiler@17.0.11/Main.java:192)
at com.sun.tools.javac.launcher.Main.main(jdk.compiler@17.0.11/Main.java:132)
I'm building a Java application (I was attempting a 10 ms polling interval when I encountered this problem - 30 ms seemed to be working for me) where I need to rapidly send out HTTP requests just like this for a period of time (I can't use web sockets). So, my question is why is this hang up happening and how can I fix it?
My best attempt at fixing it so far was raising the number of available file descriptors on both sides (uname -n unlimitied
), but to no avail.
Testing more now... I can also reproduce the same thing with Python:
import requests
while True:
response = requests.get("http://localhost:8000")
print(f"Status Code: {response.status_code}")
Then python -m http.server
for the server and you will get the SYN_SENT
hang. So, it looks like the issue may be deeper than I originally anticipated but I'm curious and open to hearing any potential remedy.
My desired behavior is for this SYN_SENT
hanging issue not to happen. I want to be able to adjust the HTTP request polling interval to be very low (even 1 ms between requests on the local network; as long as there's no resource leakage and all requests are happening in series I don't see why that wouldn't be achievable) while still having my Java or other application work in a perfectly robust manner. I also want to grasp the problem, though. The view from Wireshark I find confusing because it shows that the server has returned its HTTP 200 OK response but for some reason the client hangs while reading it (I tried to include everything necessary to reproduce what I'm seeing since I've been debugging this for hours). Thank you for your time.
I'm glad to say I've found the root cause of this issue! I noticed that whenever I was getting these SYN_SENT
hangs (as shown by ss
), I was also getting this logged into my dmesg
:
nf_conntrack: nf_conntrack: table full, dropping packet
The connection tracking table was getting filled up! Each HTTP 1.1 request I was making was running on its own TCP stream. It all makes perfect sense now.
I'm running a fairly unique Linux distro so I'm not sure if my connection tracking table is smaller than on average. There were also a few socket/resource leakage issues in my application which were exacerbating the issue even between separate programs on my system. I haven't looked into increasing the size of this table, yet. If you're running into this issue, remember that the problem could also be the small connection table of another device on the network like a router, switch, or firewall.
I left my application running overnight with a very low poll interval to see what would happen and when I woke up the Java compiler was failing because systemd had filled /tmp
with these nf_conntrack
logs it seems. That's when I thought to check dmesg
, and voila! Then I rebooted to clear out tmpfs
.