cerlangerlang-shellerlang-ports

Restarting an Erlang node after a segmentation fault


I'm currently running an Erlang application that is running C code through Nifs. However, if a segmentation fault occurs within the C code, the entire node goes down, as well as the Erlang virtual machine that the Erlang application was running on.

What is the best way to monitor the Erlang application and restart it if the virtual machine dies?


Solution

  • You want to have a look at Heart.

    In addition if you have NIF calls that are considered dangerous it is recommended to isolate them together with Erlang code close to them on a separate node. There are several ways of monitoring and restarting a node (e.g. Slave).

    Generally however I would advise against the usage of problematic NIFs, depending on for what you are using them there are more stable alternatives.

    Reason for NIF -> replacement

    Sequential speed -> better optimized Erlang code. Often the high sequential speed of NIFs come at the price of them messing with Erlangs schedulers which often results in actual worse performance.

    Interfacing with external libs/apps -> Erlangs ports are much better at failure isolation