segmentation-faultforkvxworks

Crash inside fork(); forking from a different thread eliminates the crash


TLDR: In the situation outlined below if I call fork() it crashes. If I fork() on a new thread it does not crash. What could they be doing that would cause fork() to crash?


I'm working on something that runs inside vxsim (a VxWorks emulation environment) and while exploring some of its limits I ran into this problem.

vxsim runs as a single-threaded process with its own internal task scheduler. It uses itimer() (SIGALRM) to simulate the appropriate interrupt, during which time it uses setcontext() (in Solaris; in Linux it does something similar) to make the current (only) thread resume execution on a different task.

There's little or no real distinction between "sim" functions and "host" functions. The ABI is the same; if you happen to have the address to a host function (eg, fork()) it can be called directly. The only catch is there's a function that should be called first that disables simulated interrupts & the like.

As an experiment I made it call fork() and it crashes during that call. If instead I create a new thread then fork() on the new thread it does not crash.

The simulation behaves this way on both x86 Linux & sparc Solaris 10.

For comparison I did the same thing in a much older version of VxWorks (5.x) and it works just fine.

What I'm fishing for is an answer to this question:

What could they be doing that would cause fork() to crash like this?


Solution

  • I found this [erroneous] bug report filed against glibc where the OP [more or less] writes:

    I called sigaction() with a mostly un-initialized struct instance when setting up a handler for SIGCHLD. Later, when I call fork() and the SIGCHLD handler executes, my program crashes.

    I managed to prove they are indeed doing exactly that in their code in at least one spot. To test whether that was the culprit I then zero-initialized the stack below %ESP before the offending function gets called. Unfortunately it's still crashing, though I think it's very likely there's other spots doing this sort of thing.

    Adding insult to injury the company is unwilling to fix it unless we pay extra. Sigh.