I tried every thing to find the cause but not sure whats happing, tried to jProfile, but its not getting any results as cpu is already at 100%.
I am running a Spring application (WAR file) on amazon linux on Tomcat8. no database operations but yes it access a rest api for different operations running in separate environment. There are 3 servers have same issue running behind a load balancer in a beanstalk. with average of 518K requests per hour and average latency of 72.2 milliseconds
I feel like there is not much issue with the code itself
I just tried a command kill -3 pid (java process), and got some results in catalina.out file. I can see following error again and again
"http-nio-8080-exec-13" #42 daemon prio=5 os_prio=0 tid=0x00007f0898005800 nid=0xfb2 waiting on condition [0x00007f0882dec000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
"http-nio-8080-exec-18" #41 daemon prio=5 os_prio=0 tid=0x00007f088c024000 nid=0xfb1 waiting on condition [0x00007f0882eed000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
and at the end is following information
"VM Thread" os_prio=0 tid=0x00007f08d8081000 nid=0xf8b runnable
"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007f08d801e800 nid=0xf89 runnable
"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007f08d8020800 nid=0xf8a runnable
"VM Periodic Task Thread" os_prio=0 tid=0x00007f08d80f0000 nid=0xf92 waiting on condition
JNI global references: 478
Heap
PSYoungGen total 85504K, used 28368K [0x00000000fab00000, 0x0000000100000000, 0x0000000100000000)
eden space 83968K, 33% used [0x00000000fab00000,0x00000000fc61cd08,0x00000000ffd00000)
from space 1536K, 39% used [0x00000000ffd00000,0x00000000ffd97340,0x00000000ffe80000)
to space 1536K, 0% used [0x00000000ffe80000,0x00000000ffe80000,0x0000000100000000)
ParOldGen total 175104K, used 107142K [0x00000000f0000000, 0x00000000fab00000, 0x00000000fab00000)
object space 175104K, 61% used [0x00000000f0000000,0x00000000f68a1a48,0x00000000fab00000)
Metaspace used 56563K, capacity 60632K, committed 60800K, reserved 1103872K
class space used 5218K, capacity 5736K, committed 5760K, reserved 1048576K
Could someone please explain what is happening here?
Ok here are some of the logs I pulled from the log which are in
running
state
(couple of times in log)
"ajp-nio-8009-Acceptor-0" #23 daemon prio=5 os_prio=0 tid=0x00007f08d8535800 nid=0xfa0 runnable [0x00007f0883ffe000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
- locked <0x00000000f0881618> (a java.lang.Object)
at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
at java.lang.Thread.run(Thread.java:745)
"ajp-nio-8009-ClientPoller-1" #22 daemon prio=5 os_prio=0 tid=0x00007f08d8534000 nid=0xf9f runnable [0x00007f08a83b2000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000f16a8100> (a sun.nio.ch.Util$2)
- locked <0x00000000f16a80f0> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000f16a7fc8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
at java.lang.Thread.run(Thread.java:745)
"ajp-nio-8009-ClientPoller-0" #21 daemon prio=5 os_prio=0 tid=0x00007f08d82f8800 nid=0xf9e runnable [0x00007f08a84b3000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000f16a8900> (a sun.nio.ch.Util$2)
- locked <0x00000000f16a88f0> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000f16a87c8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
at java.lang.Thread.run(Thread.java:745)
"http-nio-8080-Acceptor-0" #20 daemon prio=5 os_prio=0 tid=0x00007f08d82f7000 nid=0xf9d runnable [0x00007f08a85b4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
- locked <0x00000000f0882f68> (a java.lang.Object)
at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
at java.lang.Thread.run(Thread.java:745)
"http-nio-8080-ClientPoller-1" #19 daemon prio=5 os_prio=0 tid=0x00007f08d82f5800 nid=0xf9c runnable [0x00007f08a86b5000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000f1677900> (a sun.nio.ch.Util$2)
- locked <0x00000000f16778f0> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000f16777a8> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
at java.lang.Thread.run(Thread.java:745)
"http-nio-8080-ClientPoller-0" #18 daemon prio=5 os_prio=0 tid=0x00007f08d82f4000 nid=0xf9b runnable [0x00007f08a87b6000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
- locked <0x00000000f1666290> (a sun.nio.ch.Util$2)
- locked <0x00000000f1666280> (a java.util.Collections$UnmodifiableSet)
- locked <0x00000000f1666138> (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
at org.apache.tomcat.util.net.NioEndpoint$Poller.run(NioEndpoint.java:1034)
at java.lang.Thread.run(Thread.java:745)
There is another one Not sure if its causing a problem (but only once in log)
"http-nio-8080-exec-131" #160 daemon prio=5 os_prio=0 tid=0x00007f088c100800 nid=0x153e runnable [0x00007f08727e5000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:170)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
- locked <0x00000000f688f4d0> (a java.io.BufferedInputStream)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:704)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:647)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1536)
- locked <0x00000000f6892258> (a sun.net.www.protocol.http.HttpURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1441)
- locked <0x00000000f6892258> (a sun.net.www.protocol.http.HttpURLConnection)
at sun.net.www.protocol.http.HttpURLConnection.getHeaderFields(HttpURLConnection.java:2966)
at com.code.http.WebUtility.getUrlContents(WebUtility.java:163)
and the code on this lines is con.getHeaderFields()
, where as con is HttpURLConnection
try {
con = (HttpURLConnection) url.opencon();
con.setDoInput(true);
con.setDoOutput(true);
con.connect();
writeJson (con);
this.header = con.getHeaderFields();//this is the line in running state
this.code = con.getResponseCode();
return readSuccessStream(con);
} catch (IOException e) {
if (con != null) {
return readFailureStream(con);
}
return "a nasty error occured";
} finally {
if (con != null) {
con.disconnect();
}
}
and few other treads in running state
"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007f08d80db000 nid=0xf91 runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007f08d80c8800 nid=0xf90 waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007f08d80bb000 nid=0xf8f waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007f08d80b9000 nid=0xf8e waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
and the last one
"main" #1 prio=5 os_prio=0 tid=0x00007f08d8009800 nid=0xf88 runnable [0x00007f08de871000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
at java.net.ServerSocket.implAccept(ServerSocket.java:545)
at java.net.ServerSocket.accept(ServerSocket.java:513)
at org.apache.catalina.core.StandardServer.await(StandardServer.java:446)
at org.apache.catalina.startup.Catalina.await(Catalina.java:717)
at org.apache.catalina.startup.Catalina.start(Catalina.java:663)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:351)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:485)
Thread states
The states of a thread in a Java virtual machine are:
NEW
A thread that has not yet started is in this state.
RUNNABLE
A thread executing in the Java virtual machine is in this state.
BLOCKED
A thread that is blocked waiting for a monitor lock is in this state.
WAITING
A thread that is waiting indefinitely for another thread to perform a particular action is in this state.
TIMED_WAITING
A thread that is waiting for another thread to perform an action for up to a specified waiting time is in this state.
TERMINATED
A thread that has exited is in this state.
In Java, threads are represented as Thread objects (java.lang.Thread). When you create a new thread, literally with new Thread()
, you create a thread, but it isn't doing anything yet. Once you call Thread.start()
, it moves from the NEW
state into the RUNNABLE
state. Runnable means it has pending work for the CPU to perform. While running, a thread can move between RUNNABLE
, BLOCKED
, WAITING
, and TIMED_WAITING
. When a thread exits, it is left in the TERMINATED
state.
When trying to diagnose high CPU usage by a Java process, we can rule out threads which are NEW
or TERMINATED
since they are dead. A thread which is BLOCKED
, WAITING
, or TIMED_WAITING
are not doing anything. They are, literally, waiting for some event in the future.
What does that mean? Only threads in the RUNNABLE
state can be putting load on the CPU.
What are all those threads?
Tomcat uses a pool of threads to be able to handle multiple requests at the same time.
This first thread is the acceptor thread. Its purpose is to listen on the network socket for incoming requests. But in order to handle multiple requests simultaneously, it needs to delegate the work. Otherwise, each request would lock up the server until it finishes, meaning only one user can connect to the web server at a time. So this thread simple accepts the connections and then hands them off to another thread so it can accept more connections.
"http-nio-8080-Acceptor-0" #20 daemon prio=5 os_prio=0 tid=0x00007f08d82f7000 nid=0xf9d runnable [0x00007f08a85b4000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
- locked <0x00000000f0882f68> (a java.lang.Object)
at org.apache.tomcat.util.net.NioEndpoint$Acceptor.run(NioEndpoint.java:682)
at java.lang.Thread.run(Thread.java:745)
These threads are the worker threads to which Tomcat assigns the requests:
"http-nio-8080-exec-13" #42 daemon prio=5 os_prio=0 tid=0x00007f0898005800 nid=0xfb2 waiting on condition [0x00007f0882dec000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000000f1694f58> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
at java.util.concurrent.LinkedBlockingQueue.poll(LinkedBlockingQueue.java:467)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:85)
at org.apache.tomcat.util.threads.TaskQueue.poll(TaskQueue.java:31)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:745)
That means the thread is sleeping. It is waiting for the acceptor thread to give it work. It is contributing nothing to your 100% CPU usage.
Conclusion
The runnable threads I saw in your thread dumps show that reading from a network socket is where the CPU load is spent. I cannot say definitively why. There are two possible reasons: a problem with the network or remote system being extremely latent or reading is a problem due to memory usage.
Most of the time when I have seen this, the CPU work was coming from the garbage collector, which runs in another thread. When there is pressure to allocate memory and little is available, the garbage collector has to do a lot of expensive work to find available memory while the other threads wait. This thread might not be visible in the thread dumps you showed since it's not a thread created in Java code, but is an internal part of the Java virtual machine. I can't say for sure if that is the issue in your case. It could be a network problem or problem with the remote system you are reading from. I suggest looking carefully at every thread to try to find a pattern.