javajvmjvm-argumentsjvm-crash

How can I get the JVM to exit quickly after a SIGSEGV crash?


We have a service that crashes frequently due to some issue with TensorFlow Java. That we can live with (K8s restarts it, lots of instances). The problem is that it takes several minutes for the JVM to die. Is there some way to force a quick exit on SIGSEGV in native code?

corrupted size vs. prev_size while consolidating
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007fe4f321a898, pid=1, tid=545
#
# JRE version: OpenJDK Runtime Environment Zulu21.28+85-CA (21.0+35) (build 21+35)
# Java VM: OpenJDK 64-Bit Server VM Zulu21.28+85-CA (21+35, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# C  [libc.so.6+0x28898]  abort+0x178
#
# Core dump will be written. Default location: /data/core
#
# An error report file with more information is saved as:
# /data/hs_err_pid1.log

Some minutes later:

# [ timer expired, abort... ]
[thread 1037 also had an error]

Solution

  • Add the following JVM options:

    -XX:+SuppressFatalErrorMessage -XX:-CreateCoredumpOnCrash
    

    This will force JVM terminate immediately on SIGSEGV without creating an error report or coredump. If you still want to see a fatal error message, replace -XX:+SuppressFatalErrorMessage with -XX:ErrorLogTimeout=1.