ocamldynamic-linkingocamlbuild

OCaml call to Dynlink causes seg fault


I have an OCaml program that writes another OCaml program, compiles it and then tries to dynamically load it. Unfortunately this causes a segmentation fault on my OSX 10.14 machine, OCaml 4.07.1.

In particular my program is structured as follows:

open Helper
module type PLUGIN_TYPE = sig ... end

let plugin = ref None
let get_plugin () : (module PLUGIN_TYPE) =
  match !plugin with
  | Some x -> x
  | None -> failwith "No plugin loaded"

module Test
struct =
... get_plugin () ...
end
module Plugin : PLUGIN_TYPE =
...
end

let () = A.plugin := Some (module Plugin : PLUGIN_TYPE)

I use ocamlbuild to build the main program and then ocamlbuild again to build the plugin (which requires the same Helper modules/files as the main program).

When I try to run this I get a segfault, presumably around the time Dynlink.loadfile is executed. I am not sure what I am doing wrong, the fact that I am linking the Helper modules with both the main program and the plugin makes me uncomfortable but I am not sure how to work around it.

Attaching an LLDB trace:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001002624da Main.native`caml_oldify_local_roots at roots.c:286 [opt]
    frame #1: 0x00000001002664fb Main.native`caml_empty_minor_heap at minor_gc.c:352 [opt]
    frame #2: 0x0000000100266cc5 Main.native`caml_gc_dispatch at minor_gc.c:446 [opt]
    frame #3: 0x000000010026dca6 Main.native`caml_make_vect(len=<unavailable>, init=<unavailable>) at array.c:335 [opt]
    frame #4: 0x0000000100114eb9 Main.native`camlLru_cache__init_inner_2624 + 89
    frame #5: 0x0000000100087ea6 Main.native`camlSyntax__memoize_7621 + 38
    frame #6: 0x000000010312d317 Plugin.cmxs`camlInterp__entry + 311
    frame #7: 0x0000000100283424 Main.native`caml_start_program + 92
    frame #8: 0x000000010027ad19 Main.native`caml_callback(closure=<unavailable>, arg=<unavailable>) at callback.c:173 [opt]
    frame #9: 0x000000010027f6a0 Main.native`caml_natdynlink_run(handle_v=4345299456, symbol=72181230668639817) at natdynlink.c:141 [opt]
    frame #10: 0x000000010009d727 Main.native`camlDynlink__fun_2440 + 23
    frame #11: 0x0000000100183581 Main.native`camlStdlib__list__iter_1148 + 33
    frame #12: 0x000000010009d5bc Main.native`camlDynlink__loadunits_2288 + 332
    frame #13: 0x000000010009d788 Main.native`camlDynlink__load_2301 + 72
    frame #14: 0x000000010000552c Main.native`camlLoader__load_plugin_1002 + 268
    frame #15: 0x00000001000055d8 Main.native`camlLoader__simulate_1056 + 120
    frame #16: 0x00000001000052c8 Main.native`camlMain__entry + 280
    frame #17: 0x0000000100002489 Main.native`caml_program + 3481
    frame #18: 0x0000000100283424 Main.native`caml_start_program + 92
    frame #19: 0x00000001002617dc Main.native`caml_startup_common(argv=0x00007ffeefbff538, pooling=<unavailable>) at startup.c:157 [opt]
    frame #20: 0x000000010026184b Main.native`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup.c:162 [opt]
    frame #21: 0x0000000100261844 Main.native`caml_main [inlined] caml_startup(argv=<unavailable>) at startup.c:167 [opt]
    frame #22: 0x0000000100261844 Main.native`caml_main(argv=<unavailable>) at startup.c:174 [opt]
    frame #23: 0x00000001002618bc Main.native`main(argc=<unavailable>, argv=<unavailable>) at main.c:44 [opt]
    frame #24: 0x00007fff6d4f1ed9 libdyld.dylib`start + 1
    frame #25: 0x00007fff6d4f1ed9 libdyld.dylib`start + 1

For what it's worth those are part of what I called Helper modules:

    frame #4: 0x0000000100114eb9 Main.native`camlLru_cache__init_inner_2624 + 89
    frame #5: 0x0000000100087ea6 Main.native`camlSyntax__memoize_7621 + 38

Any clues on what I am doing wrong?


Solution

  • TL;DR; Known bug. Use dune if possible. If not use Findlib Dynlink manually. Some work is needed but is doable. You're not the first one to hit this problem.

    The Problem

    First of all, you're doing everything right, it is a relatively well-known long-term bug in OCaml. Despite this, it was resolved only recently. Don't worry, there are a couple of workarounds (mentioned below). Besides, FYI, if you are not touching the Obj module or playing with external (C) stubs, and getting a segfault, then this is definitely a bug in the OCaml system, so you can go directly to the OCaml issue tracker. Fortunately, this happens very rarely.

    Now, what is happening? The problem is that the OCaml dynamic linker is not checking whether a compilation unit is already loaded. Therefore, when you load a new unit, it could be already loaded or, in turn, load another unit which was already loaded. When a unit is loaded into OCaml process image the unit constructor (the initialization function) is called which sets the initial roots (global variables) and initializes the frames. If the unit was already initialized it breaks havoc - variables are reset, values are rewritten. If you're lucky, you will get a segmentation fault from the garbage collector. And this is what happens in your case.

    Solutions

    The fix was merged in the OCaml 4.08 version, but you probably won't be really happy with it. Yes, you won't get a segfault, but instead, your program will fail gracefully with an error indicating that you're trying to load a compilation unit which is already in the process image (the Dynlink.Error (Module_already_loaded "module name") exception). So it is the responsibility of a plugin system developer, to maintain the list of already loaded modules.

    Most likely, you don't want to develop a new system. And the good news is that such systems were already developed (and they even work for old versions of OCaml, so they are robust to prevent OCaml from segfaulting).

    I will provide two solutions below. Both are relying on Findlib Dynload facility. Which when a program (or shared object) is compiled records the list of compilation units that constitute it inside the program itself, so that later it could be consulted and a decision could be made, whether the unit should be loaded, and whether it is consistent with the units already loaded (e.g., we don't want to have multiple versions of the same library in the process image).

    Dune

    The solution number one would be to use Dune. Well, at least because it requires a minimum of work. Dune is implemented from scratch to work correctly with Findlib, so everything should work out of the box. You just need to port your project to Dune, specify findlib.dynload as the dependency of your host program (the program that loads the plugins) and use the Fl_dynload.load_packages to load your plugins.

    OCamlbuild/OASIS

    If you can't for some reasons move your project to Dune, then you have to do some work yourself. We have implemented our own plugin loading system as a part of the BAP project, so you can build your own system based on it. It is under the MIT license, so feel free to grab any code you like and modify it to your taste. Our system is providing a little bit more than you might need (we make our plugins self-contained, pack them as zip files, etc), but the idea is the same - use Fl_dynload and keep track on what you're loading. As always, the devil is in the details. If you're using OASIS or ocamlbuild to build non-trivial project (and if your project is trivial, then just port it to Dune), then the caveat is that when ocamlbuild links an internal libraru (i.e., a library from your source tree) it won't use OCamlFind and therefore the linked modules wouldn't be reported to the Dynload facility. Therefore we have to write an OCamlBuild plugin which will do this.

    Basically, your loader must track which compilation units are already loaded, and your plugin must contain meta information that tells the loader which compilation units it requires and which it provides. This requires quite a cooperation from all the parts. Here is how it works in BAP:

    1) We have the bapbuild tool which is the ocamlbuild enhanced with an (ocamlbuild) plugin that knows how to build *.plugin files. A .plugin file is a zip file underneath the hood with a fixed layout (called bundle in our parlance). It contains a MANIFEST file which includes the list of required libraries and a list of provided units, along with some meta information and, of course, the cmxs (and cma) for the code itself. Optionally, the bundle may include all the dependent libraries (to make the plugin loadable in environments where the required libraries are not provided). The bapbuild tool will package all the dependencies by default, and since some libraries in the OPAM universe do not provide cmxs at all it will also build cmxs for them and package them into the plugin. Note,

    2) We have the bap_plugins runtime library which loads plugins, fulfilling their dependencies and ensuring that no units are loaded twice.

    3) Since the host program (which loads plugins) may (and will) also contain some compilation units in it, as it will be linked from some set of compilation units that are either local to the project tree or come from external libraries. So we need some cooperation from the build system that shall tell us which units are already loaded (alternatively we can parse the ELF structures of the host binary, but this doesn't sound as a very portable and robust solution). We use ocamlfind.dynlink library which enables such cooperation, by storing a list of libraries and packages that were used to build a binary in an internal data structure. We wrote a small pocamlbuild plugin]6 that enables this and the rest is done by ocamlfind (which actually generates a file and links it into the host binary).