I recently written a signal handler which uses backtrace from execinfo.h and it is working fine on MacOs but when it is used on Linux (Ubuntu Debian) it is waiting for lock indefinitely. I'm not sure if this helps, but my multi-threaded program (pthread) uses rocksdb to store data and I intentionally kept a segfault in rocksdb so that I can test my signal handler if any issue occurs at rocksdb end but I am not able debug why lock is waiting.
This is the stacktrace I got on gdb:
#0 futex_wait (private=0, expected=2, futex_word=0x77e088a1ac80 <main_arena>) at ../sysdeps/nptl/futex-internal.h:146
#1 __GI___lll_lock_wait_private (futex=futex@entry=0x77e088a1ac80 <main_arena>) at ./nptl/lowlevellock.c:34
#2 0x000077e0888a53c8 in __GI___libc_malloc (bytes=408) at ./malloc/malloc.c:3327
#3 0x000077e088c024a3 in malloc (size=408) at ../include/rtld-malloc.h:56
#4 _dl_scope_free (old=old@entry=0x5660325219f0) at ./elf/dl-scope.c:34
#5 0x000077e088bf3308 in _dl_map_object_deps (map=map@entry=0x566032520dc0, preloads=preloads@entry=0x0, npreloads=npreloads@entry=0,
trace_mode=trace_mode@entry=0, open_mode=open_mode@entry=-2147483648) at ./elf/dl-deps.c:635
#6 0x000077e088bfda0f in dl_open_worker_begin (a=a@entry=0x7fff4a7a5010) at ./elf/dl-open.c:592
#7 0x000077e088974a98 in __GI__dl_catch_exception (exception=exception@entry=0x7fff4a7a4e70, operate=operate@entry=0x77e088bfd900 <dl_open_worker_begin>,
args=args@entry=0x7fff4a7a5010) at ./elf/dl-error-skeleton.c:208
#8 0x000077e088bfcf9a in dl_open_worker (a=a@entry=0x7fff4a7a5010) at ./elf/dl-open.c:782
#9 0x000077e088974a98 in __GI__dl_catch_exception (exception=exception@entry=0x7fff4a7a4ff0, operate=operate@entry=0x77e088bfcf60 <dl_open_worker>,
args=args@entry=0x7fff4a7a5010) at ./elf/dl-error-skeleton.c:208
#10 0x000077e088bfd34e in _dl_open (file=<optimized out>, mode=-2147483646, caller_dlopen=0x77e088925611 <__GI___libc_unwind_link_get+81>, nsid=-2, argc=3,
argv=<optimized out>, env=0x5660324f9fe0) at ./elf/dl-open.c:883
#11 0x000077e088974e01 in do_dlopen (ptr=ptr@entry=0x7fff4a7a5240) at ./elf/dl-libc.c:95
#12 0x000077e088974a98 in __GI__dl_catch_exception (exception=exception@entry=0x7fff4a7a51e0, operate=<optimized out>, args=<optimized out>)
at ./elf/dl-error-skeleton.c:208
#13 0x000077e088974b63 in __GI__dl_catch_error (objname=0x7fff4a7a5230, errstring=0x7fff4a7a5238, mallocedp=0x7fff4a7a522f, operate=<optimized out>,
args=<optimized out>) at ./elf/dl-error-skeleton.c:227
#14 0x000077e088974f37 in dlerror_run (args=0x7fff4a7a5240, operate=0x77e088974dc0 <do_dlopen>) at ./elf/dl-libc.c:45
#15 __libc_dlopen_mode (name=name@entry=0x77e0889db527 "libgcc_s.so.1", mode=mode@entry=-2147483646) at ./elf/dl-libc.c:162
#16 0x000077e088925611 in __GI___libc_unwind_link_get () at ./misc/unwind-link.c:50
#17 __GI___libc_unwind_link_get () at ./misc/unwind-link.c:40
#18 0x000077e088933b77 in __GI___backtrace (array=array@entry=0x77e088af0000 <backtrace_frames>, size=size@entry=1) at ./debug/backtrace.c:69
#19 0x000077e088a65f92 in dumpBackTrace () at my_faultHandler.c:366
#20 0x000077e088a66027 in faultHandler (signo=6) at my_faultHandler.c:344
#21 <signal handler called>
#22 __pthread_kill_implementation (no_tid=0, signo=6, threadid=131806249563968) at ./nptl/pthread_kill.c:44
#23 __pthread_kill_internal (signo=6, threadid=131806249563968) at ./nptl/pthread_kill.c:78
#24 __GI___pthread_kill (threadid=131806249563968, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#25 0x000077e088842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#26 0x000077e0888287f3 in __GI_abort () at ./stdlib/abort.c:79
#27 0x000077e088889676 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x77e0889dbb77 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#28 0x000077e0888a0cfc in malloc_printerr (str=str@entry=0x77e0889de5b8 "malloc_consolidate(): unaligned fastbin chunk detected") at ./malloc/malloc.c:5664
#29 0x000077e0888a198c in malloc_consolidate (av=av@entry=0x77e088a1ac80 <main_arena>) at ./malloc/malloc.c:4750
#30 0x000077e0888a3bdb in _int_malloc (av=av@entry=0x77e088a1ac80 <main_arena>, bytes=bytes@entry=32816) at ./malloc/malloc.c:3965
#31 0x000077e0888a5139 in __GI___libc_malloc (bytes=bytes@entry=32816) at ./malloc/malloc.c:3329
#32 0x000077e0888e630b in __alloc_dir (statp=0x7fff4a7a5ec0, flags=0, close_fd=true, fd=39) at ../sysdeps/unix/sysv/linux/opendir.c:115
#33 opendir_tail (fd=39) at ../sysdeps/unix/sysv/linux/opendir.c:63
#34 __opendir (name=<optimized out>) at ../sysdeps/unix/sysv/linux/opendir.c:86
#35 0x000077e087f93748 in rocksdb::(anonymous namespace)::PosixEnv::GetChildren (this=<optimized out>,
dir="/home/dummy/rocks", result=0x7fff4a7a6080)
at /usr/include/c++/9/bits/basic_string.h:2309
#36 0x000077e087eeaae0 in rocksdb::DBImpl::FindObsoleteFiles (this=this@entry=0x56603283fc40, job_context=job_context@entry=0x7fff4a7a6180, force=force@entry=true,
--Type <RET> for more, q to quit, c to continue without paging--
no_full_scan=no_full_scan@entry=false) at db/db_impl_files.cc:200
#37 0x000077e087eccfd3 in rocksdb::DBImpl::~DBImpl (this=0x56603283fc40, __in_chrg=<optimized out>) at db/db_impl.cc:308
#38 0x000077e087ecd3f6 in rocksdb::DBImpl::~DBImpl (this=0x56603283fc40, __in_chrg=<optimized out>) at db/db_impl.cc:357
#39 0x000077e087e66e9d in rocksdb_close (db=0x5660328a2b20) at db/c.cc:627
Signal handler code:
void RegisterFaultHandler(void)
{
struct sigaction bt_action;
sigemptyset(&bt_action.sa_mask);
bt_action.sa_handler = &faultHandler;
bt_action.sa_flags = SA_RESTART | SA_ONSTACK;
if (sigaction(SIGSEGV, &bt_action, prev_action + SIGSEGV) || sigaction(SIGBUS, &bt_action, prev_action + SIGBUS) ||
sigaction(SIGILL, &bt_action, prev_action + SIGILL) || sigaction(SIGABRT, &bt_action, prev_action + SIGABRT) ||
sigaction(SIGFPE, &bt_action, prev_action + SIGFPE) || sigaction(SIGSYS, &bt_action, prev_action + SIGSYS))
{
int savedErrno = errno;
exit(1);
}
}
static void unRegisterFaultHandler()
{
/* Install 'previous' fault handler for all 'crash' (fatal) signals */
sigaction(SIGSEGV, prev_action + SIGSEGV, NULL);
sigaction(SIGBUS, prev_action + SIGBUS, NULL);
sigaction(SIGILL, prev_action + SIGILL, NULL);
sigaction(SIGABRT, prev_action + SIGABRT, NULL);
sigaction(SIGFPE, prev_action + SIGFPE, NULL);
sigaction(SIGSYS, prev_action + SIGSYS, NULL);
}
static void faultHandler(int signo)
{
/* Disable fault_handler to call previous fault handlers, if any */
unRegisterFaultHandler();
dumpBackTrace();
/* Propagate the signal back to, previous handler */
raise(signo);
}
static void dumpBackTrace()
{
int bt_fd = openBackTraceFile(); /* This will just open my file with open() system call */
if (bt_fd >= 0)
{
static void *backtrace_frames[10];
int size = backtrace(backtrace_frames, 10);
backtrace_symbols_fd(backtrace_frames, size, bt_fd);
close(bt_fd);
}
else
{
const char error[] = "Cannot open backtrace file\n";
(void)write(STDERR_FILENO, error, sizeof(error));
}
}
I understand that calling malloc in frame #3 might be the reason as it is unsafe but I don't know how to fix this. Tried searching the internet for answers I was only able to get to the malloc part. Please let me know if you need any more info.
EDIT-1: I am only getting this issue when segmentation fault occurs at rocksdb end and I am not getting any in my program. I think it might be due to error occuring because of malloc_consolidate and backtrace itself is again calling malloc
Your signal handler is fundamentally broken - it is not async-signal-safe.
Per C11 7.1.4 Use of library functions, paragraph 4:
The functions in the standard library are not guaranteed to be reentrant and may modify objects with static or thread storage duration.188
Note the link to footnote 188:
- Thus, a signal handler cannot, in general, call standard library functions.
Per POSIX 7 "Signal Concepts and the Linux signal-safety
man page, there is a limited set of "async-signal-safe" functions that may be called from within a signal handler.
Functions not on those lists can not be safely called from within a signal handler.
Of particular interest in your backtrace is this line:
#3 0x000077e088c024a3 in malloc (size=408) at ../include/rtld-malloc.h:56
malloc()
is not on any list of async-signal-safe functions, and calling malloc()
from within a signal handler - even indirectly - is not safe and can cause problems, such as deadlocks.
Per the backtrace()
documentation, backtrace()
is async-signal-unsafe for multiple reasons:
Function: int backtrace (void **buffer, int size)
Preliminary: | MT-Safe | AS-Unsafe init heap dlopen plugin lock | AC-Unsafe init mem lock fd | See POSIX Safety Concepts
libunwind
provides async-signal-safe functionality, and is available as part of most Linux distros.