fedorainitzombie-processdefunct

init never reaping zombie/defunct processes


On my Fedora Core 9 webserver with kernel 2.6.18, init isn't reaping zombie processes. This would be bearable if it wasn't for the process table eventually reaching an upper limit where no new processes can be allocated.

Sample output of ps -el | grep 'Z':

F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
5 Z     0  2648     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
1 Z    51  2656     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
1 Z     0  2670     1  0  75   0 -     0 exit   ?        00:00:02 crond <defunct>
4 Z     0  2874     1  0  82   0 -     0 exit   ?        00:00:00 mysqld_safe <defunct>
5 Z     0 28104     1  0  76   0 -     0 exit   ?        00:00:00 httpd <defunct>
5 Z     0 28716     1  0  76   0 -     0 exit   ?        00:00:06 lfd <defunct>
5 Z    74 10172     1  0  75   0 -     0 exit   ?        00:00:00 sshd <defunct>
5 Z     0 11199     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11202     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11205     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11208     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11211     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11240     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11246     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11249     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
5 Z     0 11252     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>
1 Z     0 14106     1  0  80   0 -     0 exit   ?        00:00:00 anacron <defunct>
5 Z     0 14631     1  0  75   0 -     0 exit   ?        00:00:00 sendmail <defunct>

Is this an OS bug? misconfiguration? I'm looking for inspiration as to the source of this problem. Thanks


Solution

  • This has hit me on Ubuntu in 2 ways:

    1. Something wrong with the kernel. In my case a kernel driver had crashed and process internals went bonkers. The best way to test this is checking /var/log/syslog (and dmesg) to see if anything looks awry - for example "BUG: unable to handle kernel NULL pointer dereference at 0000000000000028",

    2. The other time I've seen this is when init is not the "parent of the child process for most purposes" (actual manpage quote). This can happen when you use the ptrace syscall (which the strace program uses internally) to attach on a process. For instance, I've gotten into a situation where I attach strace to child process B. Eventually, process B terminates as does its parent (not sure what order). Process B then looks like a zombie owned by init. However, its "most purposes" parent was actually the strace program. After killing the strace, process B was reaped