c++node.jsv8magick++node-addon-api

Magick++/NAPI module memory leak


I'm writing a native module using node-addon-api that takes advantage of the Magick++ library. The module takes a file path to an image alongside some parameters and returns a buffer. I seem to have come across a pretty bad memory leak issue which Massif reports as being related to either the buffer that is created or the Magick++ image. Here's my C++ code:

#include <napi.h>
#include <list>
#include <Magick++.h>

using namespace std;
using namespace Magick;

class FlipWorker : public Napi::AsyncWorker {
 public:
  FlipWorker(Napi::Function& callback, string in_path, bool flop, string type, int delay)
      : Napi::AsyncWorker(callback), in_path(in_path), flop(flop), type(type), delay(delay) {}
  ~FlipWorker() {}

  void Execute() {
    list<Image> frames;
    list<Image> coalesced;
    list<Image> mid;
    list<Image> result;
    readImages(&frames, in_path);
    coalesceImages(&coalesced, frames.begin(), frames.end());

    for (Image &image : coalesced) {
      flop ? image.flop() : image.flip();
      image.magick(type);
      mid.push_back(image);
    }

    optimizeImageLayers(&result, mid.begin(), mid.end());
    if (delay != 0) for_each(result.begin(), result.end(), animationDelayImage(delay));
    writeImages(result.begin(), result.end(), &blob);
  }

  void OnOK() {
    Callback().Call({Env().Undefined(), Napi::Buffer<char>::Copy(Env(), (char *)blob.data(), blob.length())});
  }

 private:
  string in_path, type;
  bool flop;
  int delay;
  Blob blob;
};

Napi::Value Flip(const Napi::CallbackInfo &info)
{
  Napi::Env env = info.Env();

  Napi::Object obj = info[0].As<Napi::Object>();
  Napi::Function cb = info[1].As<Napi::Function>();
  string path = obj.Get("path").As<Napi::String>().Utf8Value();
  bool flop = obj.Has("flop") ? obj.Get("flop").As<Napi::Boolean>().Value() : false;
  string type = obj.Get("type").As<Napi::String>().Utf8Value();
  int delay = obj.Get("delay").As<Napi::Number>().Int32Value();

  FlipWorker* flipWorker = new FlipWorker(cb, path, flop, type, delay);
  flipWorker->Queue();
  return env.Undefined();
}

Napi::Object Init(Napi::Env env, Napi::Object exports) {
  exports.Set(Napi::String::New(env, "flip"), Napi::Function::New(env, Flip));
  return exports;
}

NODE_API_MODULE(addon, Init);

And an example JS script:

const image = require("./build/Release/image.node");

setInterval(() => {
  image.flip({ path: "/home/esm/animated.gif", type: "gif", delay: 0 }, (error, buffer) => {
    console.log(buffer);
    console.log(process.memoryUsage().rss);
  });
}, 10000);

Here is a sample output of the script:

<Buffer 47 49 46 38 39 61 80 02 66 01 f7 00 00 38 44 3a 62 58 26 70 64 27 12 1c 4d 19 26 50 26 30 57 10 38 79 2c 37 67 35 51 57 14 47 79 35 4a 71 55 4f 4f 68 ... 868294 more bytes>
69496832
<Buffer 47 49 46 38 39 61 80 02 66 01 f7 00 00 38 44 3a 62 58 26 70 64 27 12 1c 4d 19 26 50 26 30 57 10 38 79 2c 37 67 35 51 57 14 47 79 35 4a 71 55 4f 4f 68 ... 868294 more bytes>
110673920
<Buffer 47 49 46 38 39 61 80 02 66 01 f7 00 00 38 44 3a 62 58 26 70 64 27 12 1c 4d 19 26 50 26 30 57 10 38 79 2c 37 67 35 51 57 14 47 79 35 4a 71 55 4f 4f 68 ... 868294 more bytes>
152092672
<Buffer 47 49 46 38 39 61 80 02 66 01 f7 00 00 38 44 3a 62 58 26 70 64 27 12 1c 4d 19 26 50 26 30 57 10 38 79 2c 37 67 35 51 57 14 47 79 35 4a 71 55 4f 4f 68 ... 868294 more bytes>
192970752
<Buffer 47 49 46 38 39 61 80 02 66 01 f7 00 00 38 44 3a 62 58 26 70 64 27 12 1c 4d 19 26 50 26 30 57 10 38 79 2c 37 67 35 51 57 14 47 79 35 4a 71 55 4f 4f 68 ... 868294 more bytes>
204517376

As you can see, the resident set size increases significantly each time the function is run. This happens with every image in any format that I use with it. How would I keep the code from leaking? Thanks in advance.

EDIT: I did some more digging and it turns out that since the buffer is not created through JS, it isn't eligible for garbage collection in the same way. I'm now wondering whether or not it's possible to create a buffer that gets garbage collected by V8 and still provides the same data.


Solution

  • Answering my own question after investigating on and off for a while.

    TL;DR: If on a glibc-based Linux distro (such as Ubuntu, Debian, or Red Hat), use an alternative memory allocator (such as mimalloc or tcmalloc) for long-lived applications that create and destroy many threads. This specific issue largely appears to have been a symptom of both memory fragmentation and a long-standing (over 20 years!) issue with memory handling in glibc.

    The short explanation: the default glibc allocator implementation will switch between using two different syscalls depending on a certain allocation size threshold (sbrk for smaller allocations, mmap for larger ones). By default this threshold is dynamic; it starts small, but increases over time depending on various factors. Sometimes, the larger buffers allocated by sbrk are never trimmed; this results in the memory not being returned back to the system, therefore causing the above issue.

    There are some workarounds that can be applied; first off, you can call malloc_trim(0); from your native code regularly (via a timer or some other mechanism) to manually trim the memory buffers. You can also set the MMAP_THRESHOLD tunable to a fixed amount, which can be done by setting the MALLOC_MMAP_THRESHOLD environment variable or by using mallopt in your native code:

    mallopt(M_MMAP_THRESHOLD, 131072);
    

    However, one of the better solutions by far is to use an alternative malloc implementation. The most notable, actively maintained ones are mimalloc and tcmalloc; the jemalloc allocator is also quite popular, but it is no longer maintained and the repository has been archived. All of these handle memory allocation differently, and one may work better for your application than the others, so definitely do plenty of testing first before shipping your application with one of these in production!

    (Switching to a distro that doesn't use glibc, such as Alpine or Chimera Linux, is also an option; however, I wouldn't recommend jumping right in without researching and further tests. Like with malloc implementations, very application/scenario is different.)

    Related links/discussions: