At work, we've been having a problem with "PermGen out of memory" exceptions, with the team lead deciding it was a bug in the JVM - something related to hot-deployment of code. Without explaining many details, he pointed out that hot deployment is a "hard problem", so hard that even .NET doesn't do it yet.
I found a lot of articles explaining hot deployment from the bird's-eye-view, but always lacking technical details. Could anyone point me to a technical explanation, and explain why hot deployment is "a hard problem"?
When a class is loaded, various static data about the class is stored in PermGen. As long as a live reference to this Class instance exists, the class instance cannot be garbage collected.
I believe that part of the problem has to do with whether or not the GC should remove old Class instances from perm gen, or not. Typically, every time you hot deploy, new class instances are added to the PermGen memory pool, and the old ones, now unused, are typically not removed. By default, the Sun JVMs will not run garbage collection in PermGen, but this can be enabled with optional "java" command arguments.
Therefore, if you hot deploy enough times, you will eventually exhaust your PermGen space.
If your web app does not shut down completely when undeployed -- if it leaves a Thread running, for example -- then all of the Class instances used by that web app will be pinned in the PermGen space. You redeploy and now have another whole copy of all of these Class instances loaded into PermGen. You undeploy and the Thread keeps going, pinning ANOTHER set of class instances in PermGen. You redeploy and load a whole net set of copies... and eventually your PermGen fills up.
You can sometimes fix this by:
-XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled
But this will help only if your web app shuts down completely and cleanly, leaving no live references to any of the Class instances of any Class loaded by the Class loaders for that Web App.
Even this will not necessarily fix the problem, due to class loader leaks. (As well as too many interned strings in some cases.)
Check out the following links for more (the two bolded ones have nice diagrams to illustrate part of the problem).