One of the threads has a lock for more than 3 seconds when querying Oracle Database. This causes many blocked threads when accesing Oracle database, and hence sudden increases in number of threads and unresposiveness of application. Im am using Tomcat 8.5, Tomcat connection pool, Java 8. Trace for blocking thread:
***"http-nio-80-exec-433" #4207 daemon prio=5 os_prio=0 tid=0x00007fd9d8042000 nid=0x503b runnable [0x00007fd839f04000]
java.lang.Thread.State: RUNNABLE
at java.util.Hashtable.get(Hashtable.java:363)
- locked <0x000000070193caa0> (a java.util.Hashtable)
at java.lang.ConditionalSpecialCasing.lookUpTable(ConditionalSpecialCasing.java:151)
at java.lang.ConditionalSpecialCasing.toUpperCaseEx(ConditionalSpecialCasing.java:123)
at java.lang.String.toUpperCase(String.java:2775)
at java.lang.String.toUpperCase(String.java:2833)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1638)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4401)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4482)
- locked <0x000000074cd7d868> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:6272)
at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractQueryReport$StatementProxy.invoke(AbstractQueryReport.java:210)
at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)
at org.hibernate.loader.Loader.getResultSet(Loader.java:1953)
at org.hibernate.loader.Loader.doQuery(Loader.java:802)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274)
at org.hibernate.loader.Loader.doList(Loader.java:2533)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276)
at org.hibernate.loader.Loader.list(Loader.java:2271)***
Here is a trace for one of the 10+ BLOCKED threads
***"http-nio-80-exec-271" #2777 daemon prio=5 os_prio=0 tid=0x00007fd9c8941800 nid=0x19c3 waiting for monitor entry [0x00007fd8356ca000]
java.lang.Thread.State: BLOCKED (on object monitor)
at java.util.Hashtable.get(Hashtable.java:363)
- waiting to lock <0x000000070193caa0> (a java.util.Hashtable)
at java.lang.ConditionalSpecialCasing.lookUpTable(ConditionalSpecialCasing.java:151)
at java.lang.ConditionalSpecialCasing.toUpperCaseEx(ConditionalSpecialCasing.java:123)
at java.lang.String.toUpperCase(String.java:2775)
at java.lang.String.toUpperCase(String.java:2833)
at oracle.jdbc.driver.CharCommonAccessor.init(CharCommonAccessor.java:164)
at oracle.jdbc.driver.VarcharAccessor.<init>(VarcharAccessor.java:88)
at oracle.jdbc.driver.T4CVarcharAccessor.<init>(T4CVarcharAccessor.java:108)
at oracle.jdbc.driver.T4CTTIdcb.fillupAccessors(T4CTTIdcb.java:431)
at oracle.jdbc.driver.T4CTTIdcb.receiveCommon(T4CTTIdcb.java:209)
at oracle.jdbc.driver.T4CTTIdcb.receive(T4CTTIdcb.java:145)
at oracle.jdbc.driver.T4C8Oall.readDCB(T4C8Oall.java:963)
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:447)
at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:235)
at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:543)
at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:239)
at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:1246)
at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1500)
at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1717)
at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:4401)
at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:4482)
- locked <0x000000074d203f60> (a oracle.jdbc.driver.T4CConnection)
at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:6272)
at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tomcat.jdbc.pool.interceptor.AbstractQueryReport$StatementProxy.invoke(AbstractQueryReport.java:210)
at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
at sun.reflect.GeneratedMethodAccessor400.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.tomcat.jdbc.pool.StatementFacade$StatementProxy.invoke(StatementFacade.java:114)
at com.sun.proxy.$Proxy637.executeQuery(Unknown Source)
at org.hibernate.jdbc.AbstractBatcher.getResultSet(AbstractBatcher.java:208)***
I have no idea why toUpperCase() would lock something (it is some Integer object being locked, as far as i found) for 30+ seconds, but this keeps occuring multiple times per day. Thread dump analysers did not found any deadlocks in dump. Tomcat pools logs that query for blocking thread http-nio-80-exec-433 took 5 minutes to complete.
Could this be a problem with jvm, memory or something else? Like jdbc driver or connection pool configuration problem?
It appears the problem was not code related. We had 10GB size catalina.out log file, and 4 bash scripts which analyzed that file for specific errors every five minutes, and because of large file size, each such analysis (mostly tail/wc commands) took 3-4 minutes. I do not know if catalina.out was being locked, but CPU usage for "tail" and "wc" commands was quite significant. Memory usage did not increase significantly though.
After manually rolling catalina.out, the problem is gone. Admin has been tasked with figuring out why logrotate is failing.
Update: The problem kept reappearing under higher loads (>50 users), so after some testing, locale was changed from "lt" to "en". Together with fixing another MyFaces cache bug, response times from Tomcat decreased 10-20 times, and number of concurrent users that can use application increased >10 times.