c++oracle-databaseredhatcoredumpquickfix

Why does a QuickFIX process on my Redhat server not write its core file where it should?


I have 10 C++ programs running on a Redhat 6.9 server, all using some internally developed libraries. One of the libraries implements logging, and keeps file descriptor 3 open for the log file. If any of the process gets a segmentation violation signal (signal 11), a core file is produced in /tmp, as expected according to /proc/sys/kernel/core_pattern. However, 1 process in particular does not do this. If it gets a signal 11, it writes a core file to the log file, which becomes useless because log messages are interleaved with the binary core information. The main thing that is different about this process is that it uses the QuickFIX C++ library version 1.14.3. I have the source for that library, and have searched it to see what it might be doing to cause this. The only signal handler it overrides is for SIGPIPE. It opens some files, but does nothing specifically with file descriptor 3. The QuickFIX process uses about 8GB of memory, but processes that use more memory write their core files correctly, so I don't think it is a file size issue.

Any ideas what the QuickFIX library could be doing to cause the core file to not go where it should, or anything else that could be doing this?


Solution

  • Turns out the problem had nothing to do with QuickFIX. It was an Oracle issue. I needed to add:

    DIAG_SIGHANDLER_ENABLED=FALSE
    

    to the $ORACLE_HOME/network/admin/sqlnet.ora file. I am not still not sure why this only happened to the QuickFIX process, and not any of the other C++ Oracle processes, but that doesn't matter for now.

    Using strace showed that a SIGKILL signal was interrupting the SIGSEGV handler and that an Oracle error message was immediately preceding this in the strace output. It also showed that Oracle was installing its own signal handlers for many signals, including SIGSEGV. This information lead me to another StckOverflow answer: Oracle Pro*C/OCI install handlers for SIGSEGV/SIGABRT and friends - why, and how to disable?