redissolariszombie-process

Zombie process in solaris 10 even with wait


I'm working on getting Redis to run on Solaris 10 and there's a few integration tests that are failing. The test I'm looking into works like this:

In spite of the wait3() call the child ends up in a zombie state.

The test fails around 90% of the time when I run it. Once it gets into a failed state it never recovers. I tried changing the test to wait significantly longer and although it appears to call wait3() many times after the process has exited, it stays in that state until the parent process(es) are killed.

Unfortunately I won't be able to work on this again until next week, so I'm researching it from home. Most of my googling has only turned up documentation or "why do processes become zombies?" type questions.

This google groups thread from the mid 90s may help, though they're mostly talking about older releases of Solaris / SunOS.


Solution

  • I was mistaken. It looks like the master node doesn't see that its child failed so doesn't wait.