I have a very big multithreaded C appication (for this and some other reasons I can not provide the entire code). Long story short, among others there are two threads: thread 1 sends SIGINT
to thread 2 while thread 2 waits in ppoll()
. Code looks something like that:
Thread 1:
...
if (thread && pthread_kill(&thread, SIGINT) != 0)
{
fprintf(log1, "Got error: %s", strerror((int)errno));
}
...
thread
here is pthread_t
variable containing handle of thread 2.
Thread 2:
...
sigset_t mask;
int rc;
sigemptyset(mask);
...
while(true)
{
...
rc = ppoll(fds, nfds, NULL, mask);
if (rc < 0)
{
if (errno == EINTR) fprintf(log2, "Got EINTR");
else fprintf(log2, "Got error: %s", strerror((int)errno));
continue;
}
fprintf(log, "%d descriptor(s) ready", rc);
}
...
When thread 1 calls pthread_kill
it is expected from thread 2 to drop from ppoll()
and print "Got EINTR"
to log, but for some reason the entire process gets terminated.
Other things I currently know:
The code worked as expected in the past. Since then some seemingly unrelated changes were made to allow the application to load data from Postgres database rather than Oracle (old implementation used Pro*C, new uses libpq). Nothing was altered in the source code of functions responsible for working with threads.
I checked which threads creates which, what signals are being sent (from/to) and in what order does all of this happen. But all that looked identical between old and new versions (besides that the new version stops working the moment thread 1 sends SIGINT
to thread 2 and the old version continues working with thread 2 successfully printing "Got EINTR"
to log.
2.5 I am absolutely sure that thread 1 sends SIGINT
signal after thread 2 reashes ppoll()
.
I tried using gdb
and with great effort was able to once again confirm that the correct thread receives SIGINT
but sadly nothing more.
I tried creating a minimum reproducible example but failed to come up with one yet (everything works as expected)
I have very little experience with signals (and even less experience with using them in multithreaded applications but it is what it is) so I likely did not provide enough information but I'm not expecting the exact answer - just some ideas of how to know what could be causing this issue. I will update the question with more specific info if pointed out in comments.
The old Pro*C library rudely installs signal handlers, including, presumably, one for SIGINT. libpq does not.
Thus, when the SIGINT is delivered during ppoll
, the Oracle-aware version of the code runs its SIGINT action and carries on. The Postgres-aware version does not, leaving you with SIG_DFL process termination.
As you discovered, installing a dummy or noop handler for SIGINT fixes the problem.