pythonlinuxsignalsillegal-instruction

How exactly is SIGILL generated?


I have a program using tensorflow on a non-supported hardware, so everytime i run it, i get the "Illegal instruction (Core dumped)" error

my main goal is to capture this error. i don't want to solve it.

The error is not printed to the stderr of my program, it's printed to the stderr of bash.

then my program exists with code 33792 which is 132 (SIGILL)

And i cannot capture it using the method mentioned here, because i'm running my command using docker run and i can't pass it the curly brackets

Is there any way to capture the stdout of bash without the curly brackets?

Also how exactly is SIGILL generated? what exactly is happening behind the scenes? Is SIGILL triggered in the parent process (bash in my case) and passed to the child process (my program)? or vice versa?

i tried adding a SIGILL handler in my program to see if i can capture it, but my program froze instead of printing the "illegal instruction" error.

I'm using Debian 11 and my program is written in python.

Edit: The SIGILL kills my python program and my goal is to capture the SIGILL from inside my program, print some error and kill my program afterward.

I don't want the (Illegal instruction) error printed to be printed in the bash's stderr, I want it to be printed to my program's stderr or stdout.

Edit: here's the sigill handler I have in my code

  def sigill_handler(sig, frame):
        print("Illegal Instruction. terminating.")

        signal.signal(signal.SIGILL, sigill_handler)

notice that this is the only signal I'm handling in my code


Solution

  • Citing https://docs.python.org/3/library/signal.html:

    Execution of Python signal handlers

    A Python signal handler does not get executed inside the low-level (C) signal handler. Instead, the low-level signal handler sets a flag which tells the virtual machine to execute the corresponding Python signal handler at a later point(for example at the next bytecode instruction). This has consequences:

    • It makes little sense to catch synchronous errors like SIGFPE or SIGSEGV that are caused by an invalid operation in C code. Python will return from the signal handler to the C code, which is likely to raise the same signal again, causing Python to apparently hang. From Python 3.3 onwards, you can use the faulthandler module to report on synchronous errors.

    • A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes.

    • If the handler raises an exception, it will be raised “out of thin air” in the main thread. See the note below for a discussion.

    According to https://docs.python.org/3/library/faulthandler.html, all the faulthandler can do is to dump a stack trace, so it does not help for your requirement.

    What you could do is to run your possibly failing program from your own wrapper program where you can check the wait status and decide what you display to the user if the program was killed by SIGILL.

    It would be better to check if your program runs on a supported platform before using any tensorflow functions.