Recently we upgraded hibernate jars in our java web application. After this upgrade we found 15-20% increase in the cpu usage. The only difference before and after is the hibernate jar version. I need to determine the root cause of this increase in the cpu usage. I took jvisualvm cpu profiler snapshots and converted them into flame graphs. From the two flame graphs I see that the stack trace is the same but there is a difference in the cpu %.
A load test with same user load and use case was executed before and after upgrade. The only difference between the two application deployments is the hibernate jars. One version has hibernate 4.3.5 and the other has 5.4.2. The flame graphs do not pinpoint hibernate functions to be the cause of the increase in cpu usage and hence I am confused how to proceed ahead with the analysis.
I need some guidance on how to compare to two flame graphs and troubleshoot the root cause of the increase in cpu usage. Kindly find the flame graphs at these links.
Sampling for 5 mins
Flame graph before - https://filebin.net/sxagcfs6lmtie654/old_report_5min.html?t=k4t2i379
Flame graph after - https://filebin.net/sxagcfs6lmtie654/report_5min.html?t=k4t2i379
Profiler snapshot before - https://filebin.net/sxagcfs6lmtie654/Old_Hibernate_Sampling_5Minute.nps?t=fvno95sr
Profiler snapshot after - https://filebin.net/sxagcfs6lmtie654/New_Hibernate_Sampling_5Minute.nps?t=fvno95sr
Sampling for 30 mins
Flame graph before - https://filebin.net/sxagcfs6lmtie654/old_report_30min.html?t=ttb7s4k4
Flame graph after - https://filebin.net/sxagcfs6lmtie654/report_30min.html?t=ttb7s4k4
Profiler snapshot before - https://filebin.net/sxagcfs6lmtie654/OldHibernateLibrary_30min.nps?t=fvno95sr
Profiler snapshot after - https://filebin.net/sxagcfs6lmtie654/LatestHibernateLibrary_30min.nps?t=fvno95sr
A good way to compare two sampling session is to compare hot method histogram. It could be done in VisualVM or with following SJK commands.
sjk ssa --histo --by-term -f OldHibernateLibrary_30min.nps
Trc (%) Frm N Term (%) Frame
64450 53% 64450 64450 53% java.util.concurrent.LinkedBlockingQueue.poll(Unknown Source)
22503 18% 22503 22503 18% sun.nio.ch.SocketChannelImpl.read(Unknown Source)
8954 7% 8954 8954 7% sun.nio.ch.SelectorImpl.select(Unknown Source)
6943 5% 6943 6943 5% java.lang.ClassLoader.loadClass(Unknown Source)
3828 3% 3828 3828 3% java.lang.Thread.sleep(Native Method)
1918 1% 1918 1918 1% java.lang.Object.wait(Native Method)
1674 1% 1674 1674 1% sun.nio.ch.SocketChannelImpl.write(Unknown Source)
...
Trc (%) Frm N Term (%) Frame
60427 44% 60427 60427 44% java.util.concurrent.LinkedBlockingQueue.poll(Unknown Source)
28568 21% 28568 28568 21% java.lang.ClassLoader.loadClass(Unknown Source)
23072 17% 23072 23072 17% sun.nio.ch.SocketChannelImpl.read(Unknown Source)
6181 4% 6181 6181 4% sun.nio.ch.SelectorImpl.select(Unknown Source)
3030 2% 3030 3030 2% java.lang.Thread.sleep(Native Method)
1542 1% 1542 1542 1% sun.nio.ch.SocketChannelImpl.write(Unknown Source)
1451 1% 1451 1451 1% java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
...
sjk ssa --histo --by-term -f LatestHibernateLibrary_30min.nps
Briefly browsing though histograms I can spot java.lang.ClassLoader.loadClass
growing from 5% to 21% (please mind that this a percents from total sample count, they do not translate into CPU usage).
Assuming both snapshot are take under same load (which I cannot verify form VisualVM snapshot), I can conclude that java.lang.ClassLoader.loadClass
is a culprit in CPU usage degradation.
Filtering histogram further
sjk ssa --histo --by-term -f OldHibernateLibrary_30min.nps -tf java.lang.ClassLoader.loadClass sjk ssa --histo --by-term -f LatestHibernateLibrary_30min.nps -tf java.lang.ClassLoader.loadClass
I can see no difference between old and new version, e.i. usage pattern stay same between versions.
From histogram I can see all paths to java.lanf.ClassLoader.loadClass
go though org.springframework.orm.jpa.SharedEntityManagerCreator$SharedEntityManagerInvocationHandler.invoke
, so problematic path is below
java.lang.ClassLoader.loadClass(Unknown Source)
org.springframework.util.ClassUtils.isVisible(Unknown Source)
org.springframework.util.ClassUtils.getAllInterfacesForClassAsSet(Unknown Source)
org.springframework.util.ClassUtils.getAllInterfacesForClassAsSet(Unknown Source)
org.springframework.orm.jpa.ExtendedEntityManagerCreator.createProxy(Unknown Source)
org.springframework.orm.jpa.ExtendedEntityManagerCreator.createProxy(Unknown Source)
org.springframework.orm.jpa.ExtendedEntityManagerCreator.createApplicationManagedEntityManager(Unknown Source)
org.springframework.orm.jpa.AbstractEntityManagerFactoryBean.invokeProxyMethod(Unknown Source)
org.springframework.orm.jpa.AbstractEntityManagerFactoryBean$ManagedEntityManagerFactoryInvocationHandler.invoke(Unknown Source)
com.sun.proxy.$Proxy671.createEntityManager(Unknown Source)
com.spmsoftware.appframework.persistence.MultitenantEntityManagerFactory.createEntityManager(Unknown Source)
org.springframework.orm.jpa.SharedEntityManagerCreator$SharedEntityManagerInvocationHandler.invoke(Unknown Source)
Conclusions
VisualVM sampling show increase time spent in java.lanf.ClassLoader.loadClass
method. Unfortunately it is a limit of thread dump based sampling you cannot pick inside native method.
java.lanf.ClassLoader.loadClass
utilization is extremely high in both old a new method, which makes me this about some miscommunication in framework.
java.lanf.ClassLoader.loadClass
high times are likely to be result of contention between threads not real CPU usage. Though relative change in metric gives us a reason to consider it related to CPU usage grow root cause.