c++error-handlingmmapmemcpy

How to Recover in C++ from memcpy "Bus Error" from FPGA?


Everything I have read suggests that memcpy does not throw an exception so try-catch statements cannot be used to handle such an error. I have been provided memory addresses and ranges by the hardware team and am accessing them through mmap, but there have been some integration issues (i.e. things for them to fix).

One DDR channel works perfectly while the same code doing the same operations often dies for another channel. The program simply halts with "Bus Error" printed on the terminal.

Once this is figured out, memory interactions should be much smoother, but this interface accepts memory operations from another device (i.e. another team). I can try to validate any incoming operations, but then there is the possibility that the hardware team does something weird as well to change what is valid or simply results in "Bus Error" for a valid operation.

So how can I keep my C++ application from dying due to future/unexpected changes from another team? Do I need to set up a signal handler? Are there other options?


Solution

  • Expanding @user2725742's suggestion of using long jumps, here is a version that converts the signal into an exception. I use it to catch both bus and segmentation fault errors because frankly I don't know a good way of generating bus errors.

    We start by defining an RAII class to set and restore signal handlers since we will mix exceptions and global state changes. This is a rather minimal, non-moveable version but sufficient for our needs.

    #include <signal.h>
    
    #include <cerrno>
    #include <system_error>
    
    class SignalRegistration
    {
        int sig;
        struct sigaction oldaction;
    public:
        using action_fun = void (*)(int, siginfo_t*, void*);
    
        SignalRegistration(int sig, action_fun fun)
        : sig(sig)
        {
            struct sigaction action {};
            action.sa_sigaction = fun;
            action.sa_flags = SA_SIGINFO;
            if(sigaction(sig, &action, &oldaction))
                throw std::system_error(
                      errno, std::generic_category(), "sigaction");
        }
        SignalRegistration(const SignalRegistration&) = delete;
        SignalRegistration& operator=(const SignalRegistration&) = delete;
        ~SignalRegistration()
        { sigaction(sig, &oldaction, nullptr); }
    };
    

    We can now define the actual handler for the signals. The idea is this:

    1. We activate signal handlers for SIGBUS and SIGSEGV
    2. We set the state for a longjmp
    3. If the signal handler is invoked, it saves the signal state and returns via longjmp
    4. The longjmp returns to the place where it was set. There the signal state is converted into an exception

    In order to make this work in a multithreaded environment, the longjmp buffer and saved signal state need to be a thread local variable.

    #include <sstream>
    // using std::ostringstream
    
    class FaultHandler
    {
    public:
        struct SigState
        {
            sigjmp_buf env;
            siginfo_t info;
        };
        static inline thread_local SigState instance = {};
    private:
        static void sighandler(int /*sig*/, siginfo_t* siginfo, void* /*ucontext*/)
        {
            instance.info = *siginfo;
            siglongjmp(instance.env, 1);
        }
        SignalRegistration bus, segv;
    
    public:
        FaultHandler()
        : bus(SIGBUS, &FaultHandler::sighandler),
          segv(SIGSEGV, &FaultHandler::sighandler)
        {}
        [[noreturn]] static void raise()
        {
            std::ostringstream ss;
            ss << "Received signal " << instance.info.si_signo
               << " code " << instance.info.si_code
               << " address " << instance.info.si_addr;
            int errcode = instance.info.si_signo == SIGBUS ? EIO : EFAULT;
            throw std::system_error(errcode, std::generic_category(), ss.str());
        }
    };
    

    Now we can use this to guard the memcpy or some other functions. Specific requirements:

    1. They must not involve the creation of new objects with non-trivial destructors. To be precise: If the signal handler is invoked, automatic variables which were created between sigsetjmp and siglongjmp will be forgotten without invoking their destructor first
    2. The function should be async-signal-safe. See the section Undefined behavior in man 3 siglongjmp (curiously absent from other man-pages with the same title)

    Anyway, memcpy is safe to use with this:

    #include <cstring>
    // using std::memcpy
    
    void* memcpy_fault(void* out, const void* in, std::size_t nbytes)
    {
        FaultHandler check;
        if(sigsetjmp(FaultHandler::instance.env, 1))
            FaultHandler::raise();
        return std::memcpy(out, in, nbytes);
    }
    

    And here is a quick test to see it working:

    #include <iostream>
    
    int main()
    {
        char buf[10];
        try {
            memcpy_fault(buf, nullptr, 4);
        } catch(std::system_error& err) {
            std::cout << "Caught exception: " << err.what() << '\n';
        }
    }