javaperformancecpu-usageflamegraph

Detecting the cause of 15% cpu usage in java web application


Recently we upgraded hibernate jars in our java web application. After this upgrade we found 15-20% increase in the cpu usage. The only difference before and after is the hibernate jar version. I need to determine the root cause of this increase in the cpu usage. I took jvisualvm cpu profiler snapshots and converted them into flame graphs. From the two flame graphs I see that the stack trace is the same but there is a difference in the cpu %.

A load test with same user load and use case was executed before and after upgrade. The only difference between the two application deployments is the hibernate jars. One version has hibernate 4.3.5 and the other has 5.4.2. The flame graphs do not pinpoint hibernate functions to be the cause of the increase in cpu usage and hence I am confused how to proceed ahead with the analysis.

I need some guidance on how to compare to two flame graphs and troubleshoot the root cause of the increase in cpu usage. Kindly find the flame graphs at these links.

Sampling for 5 mins

Flame graph before - https://filebin.net/sxagcfs6lmtie654/old_report_5min.html?t=k4t2i379

Flame graph after - https://filebin.net/sxagcfs6lmtie654/report_5min.html?t=k4t2i379

Profiler snapshot before - https://filebin.net/sxagcfs6lmtie654/Old_Hibernate_Sampling_5Minute.nps?t=fvno95sr

Profiler snapshot after - https://filebin.net/sxagcfs6lmtie654/New_Hibernate_Sampling_5Minute.nps?t=fvno95sr

Sampling for 30 mins

Flame graph before - https://filebin.net/sxagcfs6lmtie654/old_report_30min.html?t=ttb7s4k4

Flame graph after - https://filebin.net/sxagcfs6lmtie654/report_30min.html?t=ttb7s4k4

Profiler snapshot before - https://filebin.net/sxagcfs6lmtie654/OldHibernateLibrary_30min.nps?t=fvno95sr

Profiler snapshot after - https://filebin.net/sxagcfs6lmtie654/LatestHibernateLibrary_30min.nps?t=fvno95sr


Solution

  • A good way to compare two sampling session is to compare hot method histogram. It could be done in VisualVM or with following SJK commands.

    sjk ssa --histo --by-term -f OldHibernateLibrary_30min.nps
    Trc     (%)  Frm  N  Term   (%)  Frame
     64450  53%  64450   64450  53%  java.util.concurrent.LinkedBlockingQueue.poll(Unknown Source)
     22503  18%  22503   22503  18%  sun.nio.ch.SocketChannelImpl.read(Unknown Source)
      8954   7%  8954     8954   7%  sun.nio.ch.SelectorImpl.select(Unknown Source)
      6943   5%  6943     6943   5%  java.lang.ClassLoader.loadClass(Unknown Source)
      3828   3%  3828     3828   3%  java.lang.Thread.sleep(Native Method)
      1918   1%  1918     1918   1%  java.lang.Object.wait(Native Method)
      1674   1%  1674     1674   1%  sun.nio.ch.SocketChannelImpl.write(Unknown Source)
    ...
    Trc     (%)  Frm  N  Term   (%)  Frame
     60427  44%  60427   60427  44%  java.util.concurrent.LinkedBlockingQueue.poll(Unknown Source)
     28568  21%  28568   28568  21%  java.lang.ClassLoader.loadClass(Unknown Source)
     23072  17%  23072   23072  17%  sun.nio.ch.SocketChannelImpl.read(Unknown Source)
      6181   4%  6181     6181   4%  sun.nio.ch.SelectorImpl.select(Unknown Source)
      3030   2%  3030     3030   2%  java.lang.Thread.sleep(Native Method)
      1542   1%  1542     1542   1%  sun.nio.ch.SocketChannelImpl.write(Unknown Source)
      1451   1%  1451     1451   1%  java.util.concurrent.LinkedBlockingQueue.take(Unknown Source)
    ...
    sjk ssa --histo --by-term -f LatestHibernateLibrary_30min.nps
    

    Briefly browsing though histograms I can spot java.lang.ClassLoader.loadClass growing from 5% to 21% (please mind that this a percents from total sample count, they do not translate into CPU usage).

    Assuming both snapshot are take under same load (which I cannot verify form VisualVM snapshot), I can conclude that java.lang.ClassLoader.loadClass is a culprit in CPU usage degradation.

    Filtering histogram further

    sjk ssa --histo --by-term -f OldHibernateLibrary_30min.nps -tf java.lang.ClassLoader.loadClass sjk ssa --histo --by-term -f LatestHibernateLibrary_30min.nps -tf java.lang.ClassLoader.loadClass

    I can see no difference between old and new version, e.i. usage pattern stay same between versions.

    From histogram I can see all paths to java.lanf.ClassLoader.loadClass go though org.springframework.orm.jpa.SharedEntityManagerCreator$SharedEntityManagerInvocationHandler.invoke, so problematic path is below

    java.lang.ClassLoader.loadClass(Unknown Source)
    org.springframework.util.ClassUtils.isVisible(Unknown Source)
    org.springframework.util.ClassUtils.getAllInterfacesForClassAsSet(Unknown Source)
    org.springframework.util.ClassUtils.getAllInterfacesForClassAsSet(Unknown Source)
    org.springframework.orm.jpa.ExtendedEntityManagerCreator.createProxy(Unknown Source)
    org.springframework.orm.jpa.ExtendedEntityManagerCreator.createProxy(Unknown Source)
    org.springframework.orm.jpa.ExtendedEntityManagerCreator.createApplicationManagedEntityManager(Unknown Source)
    org.springframework.orm.jpa.AbstractEntityManagerFactoryBean.invokeProxyMethod(Unknown Source)
    org.springframework.orm.jpa.AbstractEntityManagerFactoryBean$ManagedEntityManagerFactoryInvocationHandler.invoke(Unknown Source)
    com.sun.proxy.$Proxy671.createEntityManager(Unknown Source)
    com.spmsoftware.appframework.persistence.MultitenantEntityManagerFactory.createEntityManager(Unknown Source)
    org.springframework.orm.jpa.SharedEntityManagerCreator$SharedEntityManagerInvocationHandler.invoke(Unknown Source)
    

    Conclusions

    VisualVM sampling show increase time spent in java.lanf.ClassLoader.loadClass method. Unfortunately it is a limit of thread dump based sampling you cannot pick inside native method.

    java.lanf.ClassLoader.loadClass utilization is extremely high in both old a new method, which makes me this about some miscommunication in framework.

    java.lanf.ClassLoader.loadClass high times are likely to be result of contention between threads not real CPU usage. Though relative change in metric gives us a reason to consider it related to CPU usage grow root cause.