I have a long running Java process running in CentOs Machine. I have both info and error logs set-up properly. The process ran for longer time (18+ hours) and disappeared all of a sudden. There is no trace of error/exception (OutOfMemoryError/ OutOfDiskSpace Error). How to figure out what has really happened, as in why and how the process got killed?
These are the OS details.
CentOS release 5.11 (Final)
Kernel \r on an \m
Are there any standard system logs or commands to figure out? This job is running in a servlet in Tomcat. Tomcat is also going down mysteriously.
Your process is mostly likely killed because the system runs out of memory. When it happens it first tries to kill short-running processes instead of long running ones. OOM Killer is unlikely to be logged in your application logs.
Check dmesg
and try to find there info about killing <java_pid>
.
Here is how "badness" of a task to kill determined in Linux https://www.kernel.org/doc/gorman/html/understand/understand016.html#toc21 :
badness_for_task = total_vm_for_task / (sqrt(cpu_time_in_seconds) *
sqrt(sqrt(cpu_time_in_minutes)))
Kernel steps through all running tasks.