pythoncpython-3.xnumpy

PyArray_SimpleNewFromData


So I am trying to write a C function that accepts a numpy array object, extracts the data, does some manipulations and returns another c array as a numpy array object. Everything works seamlessly and I use python wrappers which help easy manipulation on the python side. However, I am facing a memory leak. I have an output pointer of doubles that I malloc-ed and which I wrap into a Python array object just before returning it to the calling python function,

PyObject *arr;
int nd = 2;
npy_intp dims[] = {5, 10};
double *data = some_function_that_returns_a_double_star(x, y, z);

arr = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, (void *)data);
return arr;

However, this creates a memory leak, because data is never freed and I did some googling to find that this is a problem in such applications and solution is non-trivial. The most helpful resource that I found on this is given here. I could not implement the destructor that this page talks about from the given example. Can someone help me with this? More concretely I am looking for something like,

PyObject *arr;
int nd = 2;
npy_intp dims[] = {5, 10};
double *data = some_function_that_returns_a_double_star(x, y, z);

arr = PyArray_SimpleNewFromData(nd, dims, NPY_DOUBLE, (void *)data);
some_destructor_that_plug_memLeak_due_to_data_star(args);
return arr;

Solution

  • The technique described in the link you didn't understand is a good one: create a Python object that knows how to free your memory when destroyed, and make it the base of the returned array.

    It sounds like you might have been overwhelmed by the complexity of creating a new extension type. Fortunately, that's not necessary. Python comes with a type designed to perform arbitrary C-level cleanup when destroyed: capsules, which bundle together a pointer and a destructor function and call the destructor when the capsule is destroyed.

    To create a capsule for your memory, first, we define a destructor function:

    void capsule_cleanup(PyObject *capsule) {
        void *memory = PyCapsule_GetPointer(capsule, NULL);
        // I'm going to assume your memory needs to be freed with free().
        // If it needs different cleanup, perform whatever that cleanup is
        // instead of calling free().
        free(memory);
    }
    

    And you set a capsule as your array's base with

    PyObject *capsule = PyCapsule_New(data, NULL, capsule_cleanup);
    PyArray_SetBaseObject((PyArrayObject *) arr, capsule);
    // Do not Py_DECREF the capsule; PyArray_SetBaseObject stole your
    // reference.
    

    And that should ensure your memory gets freed once it's no longer in use.


    You might be tempted to try to solve the problem by setting the array's OWNDATA flag. But that will attempt to free the memory using NumPy's allocator, which may not be the same allocator your code uses. This can produce weird, platform-specific crashes.

    It is only safe to use the OWNDATA flag if you can guarantee the data buffer you're using was actually allocated by NumPy's allocator. NumPy provides the PyDataMem_NEW, PyDataMem_FREE, and PyDataMem_RENEW functions to allocate, free, or reallocate memory with the allocator it uses for array data buffers:

    If you control the allocation of this buffer, you may be able to allocate it with PyDataMem_NEW, in which case you can set the OWNDATA flag and skip the capsule thing. Otherwise, use a capsule.