performancetomcatwindows-server-2012-r2dspaceyourkit

What would cause Tomcat (v8) to CPU spike with periodic regularity


On a windows 2012 RT (x64) TEST server we are running a Tomcat 8 installation and the CPU usage is disconcerting in its regularity of hitting peak usage.

The behavior is happening after an installation of our application but before anyone is accessing it. I have accessed a few pages and tested some features but nothing that could create this behavior that I know of.

There are 2 virtual processors on the server and every ~20 seconds, the CPU usage will spike (on the one processor that is running Tomcat) to 100% for 10 seconds (give or take). See below:

Graph of periodic spikes

The regularity of the pattern indicates to me that something is incorrect in either the installation or the settings of Tomcat 8.

I have installed the YourKit Java Profiler (by SO recommendation) which I was hoping could shed some light on what is causing these spikes, but haven't been able to see the reason the threads are starting -- at least in part to my newness to YourKit. I did attach it to the Tomcat launch file and it seems to be tracking the behavior.

The catalina logs are silent during the spiking occurrences (as are my application logs) but when I stopped Tomcat there were some messages about ThreadLocals getting started but could not be removed and then: "...Threads are going to be renewed over time to try and avoid a probable memory leak."

I left the server running over the weekend and the pattern has continued until today so I don't think my symptoms are going away. Whatever is starting up has now consumed all available RAM on the system just from starting up these threads (and/or YourKit) every 20 seconds.

What is a possible approach to isolate this aberrant Tomcat activity and hopefully stop or rectify it?

There are many graphs and tabs in YourKit so I hesitate to list everything that might be helpful. Thanks for helping me narrow down the problem with what YourKit (or other tools) could offer me.

Info from catalina log regarding start-up:

Apache Tomcat/8.0.23
Architecture: amd64
Java Home: C:\Program Files\Java\jre1.8.0_65
CATALINA_BASE: C:\Program Files\Apache Software Foundation\Tomcat 8.0

2015-12-08 Update

At Gergely's request, the application is a local installation of DSpace. It's a Java application with a Postgres SQL database backend. We are customizing an opensource version of it from here: http://www.dspace.org/introducing. I'm not exactly sure what else can be helpful and I think the stack trace is more revealing as to what is (and isn't) running -- see below.

By turning on Stack Telemetry in YourKit, "CPU Estimation" was made available by dragging the cursor across a period of profiler history. To me, it looks like all the CPU is doing is spinning idly. Are the Java files pictured below Tomcat routines? They don't strike me as DSpace related (although I'm not an expert) nor does it look like any work is being done while the CPU is peaking.

Of note: the stack trace is identical during the quiet periods -- the only difference being CPU Time (ms) is in the hundreds rather than thousands of milliseconds. For a more direct comparison than what is below, the hump represents ~8,000 ms in Thread.run() and the quiet periods consume ~125 ms of cpu time (although covering approximately the same amount of time).

Lastly, when pages of the application are being requested, a subsequent branch of code appears in the Call Tree. If it happened during the time of a spike it may only take 400 ms of CPU time to load a whole page. The code branch that appears is ApplicationFilterChain.java as a whole separate branch alongside PooledExecutor$Worker.run() -- both underneath java.lang.Thread.run() in the hierarchy.

When trying to interpret the stack trace: Is EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run() responsible?

Processor spikes with no known, associated activity

CPU Profiling

2015-12-08 Update #2

YourKit comes pre-configured to hide certain java class name patterns which obscured drilling down on java.lang.Thread. Clearing the filters enabled the following screenshots showing that the vast majority of processing time during a spike event is through calling the following 3 methods:

My apologies for not yet knowing enough about Tomcat or DSpace to know who is launching these tasks. (In case it matters the line directly above the first line is java.lang.Thread.run() and then <All threads>)

better stack trace


Solution

  • Thank you to those who has viewed and responded to this inquiry. As various individuals have surmised, the problem was related to our settings and use of Tomcat -- not a problem with Tomcat itself (most likely).

    This is an attempt to answer the question without perfect knowledge at installing the DSpace application and Tomcat but I think I know enough to be dangerous and potentially helpful to follow-up users.

    When installing the application DSpace there are some installation properties in Tomcat's configuration directories that determine whether or not to allow for changes in coding files to be reflected immediately without a Tomcat restart. These settings for us were previously in the directory [tomcat]/conf/Catalina/localhost/ and each of the three files contained a small, insignificant XML file like (e.g. oai.xml):

    <?xml version='1.0'?>
    <Context docBase="E:/dspace/webapps/oai"
        reloadable="false"
        cachingAllowed="true"/>
    

    You can find documentation on these properties at the following link: https://wiki.duraspace.org/display/DSDOC5x/Installing+DSpace

    Within that documentation is a recommendation about the reloadable and cachingAllowed properties. Search for "Tomcat Context Settings in Production". Here is an excerpt (emphasis mine):

    These settings are extremely useful to have when you are first getting started with DSpace, as they let you tweak the DSpace XMLUI (XSLTs or CSS) or JSPUI (JSPs) and see your changes get automatically reloaded by Tomcat (without having to restart Tomcat). However, it is worth noting that the Apache Tomcat documentation recommends Production sites leave the default values in place (reloadable="false" cachingAllowed="true"), as allowing Tomcat to automatically reload all changes may result in "significant runtime overhead".

    It is entirely up to you whether to keep these Tomcat settings in place. We just recommend beginning with them, so that you can more easily customize your site without having to require a Tomcat restart. Smaller DSpace sites may not notice any performance issues with keeping these settings in place in Production. Larger DSpace sites may wish to ensure that Tomcat performance is more streamlined.

    When I switched these boolean flags to reloadable="false" and cachingAllowed="true" the spiked CPU experience stopped immediately. I don't know if the warning about "Larger sites" applies to us or whether "streamlined performance" could refer to the negative activity I observed.

    I presume there may be other problems with our installation that allowed this particular manifestation; one ominous clue is that our production server seems to be operating with these flags in the reloadable="true" configuration. Java, Tomcat, Windows, AND DSpace are ALL getting new versions at the same time so it is fairly difficult to pinpoint why similar Tomcat <context> settings produce such different results.

    I am at least content for now to have new behavior and that the system has calmed down. I'll post more if I learn more but will be focusing next on other quandaries.

    Update

    FWIW, the attributes are settings that directly control Tomcat and they have changed between versions. E.g., cachingAllowed was removed in version 8 which means it can be removed from the Context elements. Compare:

    https://tomcat.apache.org/tomcat-8.0-doc/config/context.html#Attributes https://tomcat.apache.org/tomcat-7.0-doc/config/context.html#Attributes

    And for good measure, here is the help text for reloadable in the Tomcat 8 documentation:

    Set to true if you want Catalina to monitor classes in /WEB-INF/classes/ and /WEB-INF/lib for changes, and automatically reload the web application if a change is detected. This feature is very useful during application development, but it requires significant runtime overhead and is not recommended for use on deployed production applications. That's why the default setting for this attribute is false. You can use the Manager web application, however, to trigger reloads of deployed applications on demand.

    So it would seem that the ultimate answer is that Tomcat 8 on Windows 2012-R2 with the flag reloadable='true' polls for changes to WEB-INF/lib and WEB-INF/classes. The volume of the folders and files to peruse may very well be the cause of these intense, spiked CPU events. For now I will be relying on reloadable='false' which definitely removes the symptom for us.