I have been attempting to resolve an issue where the execution of certain scripts causes a deadlock, putting all subsequent requests into limbo, using up 99.9% of the CPU, and eventually effectively crashing the server.
Here is an example stack trace for one of the requests which has been put in limbo (waiting eternally):
Thread Stack Trace
Trace Time: 21:00:44.463 06-Jun-2012
Request ID: 6131
Script Name: http://www.example.com/allreviews.cfm
Started: 21:00:21.225 06-Jun-2012
Exec Time: 23238ms
Memory Used: (24%)230,667KB
Memory Free: 701,428KB
Thread ID: 0x191e (6430)
Thread Name: jrpp-494
Priority: 5
Hashcode: 1081611879
State: WAITING
"jrpp-494" prio=5 in Object.wait()
java.lang.Object.wait(Object.java:???)[Native Method]
- waiting on <0x9253305> (a coldfusion.util.AbstractCache$Lock)
java.lang.Object.wait(Object.java:485)
coldfusion.util.AbstractCache.fetch(AbstractCache.java:46)
coldfusion.util.SoftCache.get_statsOff(SoftCache.java:133)
coldfusion.util.SoftCache.get(SoftCache.java:81)
coldfusion.runtime.TemplateClassLoader.findClass(TemplateClassLoader.java:609)
coldfusion.runtime.RuntimeServiceImpl.getFile(RuntimeServiceImpl.java:785)
coldfusion.runtime.RuntimeServiceImpl.resolveTemplatePath(RuntimeServiceImpl.java:766)
coldfusion.tagext.lang.CustomTag.setName(CustomTag.java:21)
cfApplication2ecfm456206189._factor0(/srv/www/htdocs/www.example.com/www/Application.cfm:28)
cfApplication2ecfm456206189.runPage(/srv/www/htdocs/www.example.com/www/Application.cfm:1)
coldfusion.runtime.CfJspPage.invoke(CfJspPage.java:231)
coldfusion.tagext.lang.IncludeTag.doStartTag(IncludeTag.java:416)
coldfusion.filter.CfincludeFilter.invoke(CfincludeFilter.java:65)
coldfusion.filter.CfincludeFilter.include(CfincludeFilter.java:33)
coldfusion.filter.ApplicationFilter.invoke(ApplicationFilter.java:279)
coldfusion.filter.RequestMonitorFilter.invoke(RequestMonitorFilter.java:48)
coldfusion.filter.MonitoringFilter.invoke(MonitoringFilter.java:40)
coldfusion.filter.PathFilter.invoke(PathFilter.java:94)
coldfusion.filter.ExceptionFilter.invoke(ExceptionFilter.java:70)
coldfusion.filter.ClientScopePersistenceFilter.invoke(ClientScopePersistenceFilter.java:28)
coldfusion.filter.BrowserFilter.invoke(BrowserFilter.java:38)
coldfusion.filter.NoCacheFilter.invoke(NoCacheFilter.java:46)
coldfusion.filter.GlobalsFilter.invoke(GlobalsFilter.java:38)
coldfusion.filter.DatasourceFilter.invoke(DatasourceFilter.java:22)
coldfusion.filter.CachingFilter.invoke(CachingFilter.java:62)
coldfusion.CfmServlet.service(CfmServlet.java:200)
coldfusion.bootstrap.BootstrapServlet.service(BootstrapServlet.java:89)
jrun.servlet.FilterChain.doFilter(FilterChain.java:86)
com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doHttpServletRequest(FusionReactorCoreFilter.java:503)
com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doFusionRequest(FusionReactorCoreFilter.java:337)
com.intergral.fusionreactor.filter.FusionReactorCoreFilter.doFilter(FusionReactorCoreFilter.java:246)
com.intergral.fusionreactor.filter.FusionReactorFilter.doFilter(FusionReactorFilter.java:121)
jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
coldfusion.monitor.event.MonitoringServletFilter.doFilter(MonitoringServletFilter.java:42)
coldfusion.bootstrap.BootstrapFilter.doFilter(BootstrapFilter.java:46)
jrun.servlet.FilterChain.doFilter(FilterChain.java:94)
jrun.servlet.FilterChain.service(FilterChain.java:101)
jrun.servlet.ServletInvoker.invoke(ServletInvoker.java:106)
jrun.servlet.JRunInvokerChain.invokeNext(JRunInvokerChain.java:42)
jrun.servlet.JRunRequestDispatcher.invoke(JRunRequestDispatcher.java:286)
jrun.servlet.ServletEngineService.dispatch(ServletEngineService.java:543)
jrun.servlet.jrpp.JRunProxyService.invokeRunnable(JRunProxyService.java:203)
jrunx.scheduler.ThreadPool$DownstreamMetrics.invokeRunnable(ThreadPool.java:320)
jrunx.scheduler.ThreadPool$ThreadThrottle.invokeRunnable(ThreadPool.java:428)
jrunx.scheduler.ThreadPool$UpstreamMetrics.invokeRunnable(ThreadPool.java:266)
jrunx.scheduler.WorkerThread.run(WorkerThread.java:66)
If you're interested, you can see the full stack trace with what I'll call 'the locking script' at the top and all the others waiting on it.
When I first encountered this issue, I didn't have the stack traces. I posted the question, "When ColdFusion is maxing out the CPU, how do I find out what it's chewing/choking on?". I received many helpful responses and by looking at the stack traces I was able to determine that it was the same three scripts that were causing this deadlock issue over and over again.
In each case the top line in 'the locking script' reads:
coldfusion.compiler.ClassReader.skipFully(ClassReader.java:79)
And all the other requests are clogged up behind it have the following line in their respective stack traces:
- waiting on <0x9253305> (a coldfusion.util.AbstractCache$Lock)
One thing that was bothering me, was why my request timeout was not being respected; these scripts would just hang forever and never die. WTF, right? So I had to do it myself. So then when I kill 'the locking script', the others are released from limbo. At that point, if they are below the request timeout, they finish processing, and if they are over it (which most all of them usually are) then they simply proceed to timeout. But they won't timeout on their own and the requests just pile up until the active threads are used and the thread queue is full and everything seizes up.
Manually killing these each time they're requested is obviously not a solution, so, as my wife is always reminding me, "debug, debug, debug". Using a conditional <cfabort>
I stepped through and found that it was making it all the way through the Application.cfm, through my header.cfm, and right up to the <cfinclude>
of the problem script. If I put the <cfabort>
inside the problem script (even at the very top), it would not abort and the deadlock issue would occur. If I put it just before the include, the request would abort and the deadlock issue would be avoided. Bizarre.
There's no code between these two places, right? Just before the include and just inside the include ought to be functionally equivalent, no? Probably not, because clearly something is going on in there.
I'm not using any <cflock>
tags. The locking that is happening is appears to be happening at the template cache level. This same behavior is observed regardless of whether the 'Trusted Cache', 'Cache Template In Request', or "Component Cache' options are checked in admin (in any combination of checked/unchecked). I've cleared the template cache and the component cache each more than once. I restarted the CF server over and over...all to no avail.
During troubleshooting, I read this article describing a similar issue with a compiler cache lock in CF8 (8.0.1) along with instructions for applying a patch to fix it. But that isn't CF9...so obviously I can't apply their patch.
What to do? Has anybody else encountered this issue? ...And have a solution?
It turns out that sometimes the class files are corrupted and need to be regenerated. When you experience this issue, it will manifest as described above, with a deadlock while attempting to load the corrupted class file.The steps to regenerate the class files are simple, as follows:
Go to the ColdFusion Administrator > Server Settings > Caching
Uncheck the following option:
[ ] Save class files
When you select this option, the class files generated by ColdFusion are saved to disk for reuse after the server restarts. Adobe recommends this for production systems. During development, Adobe recommends that you do not select this option.
Click the [Submit Changes] button
Restart the ColdFusion server
Go to the ColdFusion Administrator > Server Settings > Caching
Check the following option:
[x] Save class files
When you select this option, the class files generated by ColdFusion are saved to disk for reuse after the server restarts. Adobe recommends this for production systems. During development, Adobe recommends that you do not select this option.
Click the [Submit Changes] button
And your done! The problem disappears entirely, and all is once again as it should be. ;-)
Why and how do the class files become corrupted? I don't know. Perhaps this could be the subject of another question. All I know is, this solves the problem. I'm usually hesitant to accept my own answers as authoritative, so, if someone else has a better explanation of this problem and its solution, please feel free to post it.