pythonpython-extensionsrefcounting

Need guidance regarding reference counting


I'm chasing a memory leak that seems to come from a long-running process which contains a C extension that I wrote. I've been poring over the code and the Extensions docs and I'm sure it's correct but I'd like to make sure regarding the reference handling of PyList and PyDict.

From the docs I gather that PyDict_SetItem() borrows references to both key and value, hence I have to DECREF them after inserting. PyList_SetItem() and PyTuple_SetItem() steal a reference to the inserted item so I don't have to DECREF. Correct?

Creating a dict:

PyObject *dict = PyDict_New();
if (dict) {
    for (i = 0; i < length; ++i) {
        PyObject *key, *value;
        key = parse_string(ctx); /* returns a PyString */
        if (key) {
            value = parse_object(ctx); /* returns some PyObject */
            if (value) {
                PyDict_SetItem(dict, key, value);
                Py_DECREF(value); /* correct? */
            }
            Py_DECREF(key); /* correct? */
        }
        if (!key || !value) {
            Py_DECREF(dict);
            dict = NULL;
            break;
        }
    }
}
return dict;

Creating a list:

PyObject *list = PyList_New(length);
if (list) {
    PyObject *item;
    for (i = 0; i < length; ++i) {
        item = parse_object(ctx); /* returns some PyObject */
        if (item) {
            PyList_SetItem(list, i, item);
            /* No DECREF here */
        } else {
            Py_DECREF(list);
            list = NULL;
            break;
        }
    }
}
return list;

The parse_* function don't need extra scrutiny: They only create objects on their last line like this (for example):

return PyLong_FromLong(...);

If they encounter an error, they don't create any object but set an exception earlier in the function body:

return PyErr_Format(...);

EDIT

Here's some output from valgrind --leak-check=full. Clearly it is my code leaking memory, but why? Why is PyDict_New is at the top of the (recursive) chain? Does that mean that the dict created here doesn't get DECREF'd when the whole thing is garbage collected?

Just to be clear here: When I build a nested data structure of Python types in C and then DECREF the topmost instance, Python will recursively DECREF all the contents of the structure, won't it?

==4357==    at 0x4C29BE3: malloc (vg_replace_malloc.c:299)
==4357==    by 0x4F20DBC: PyObject_Malloc (in /usr/lib64/libpython3.6m.so.1.0)
==4357==    by 0x4FC0F98: _PyObject_GC_Malloc (in /usr/lib64/libpython3.6m.so.1.0)
==4357==    by 0x4FC102C: _PyObject_GC_New (in /usr/lib64/libpython3.6m.so.1.0)
==4357==    by 0x4F11EC0: PyDict_New (in /usr/lib64/libpython3.6m.so.1.0)
==4357==    by 0xE5821BA: parse_dict (parser.c:350)
==4357==    by 0xE581987: parse_object (parser.c:675)
==4357==    by 0xE5821F0: parse_dict (parser.c:358)
==4357==    by 0xE581987: parse_object (parser.c:675)
==4357==    by 0xE5823CE: parse (parser.c:727)

Solution

  • Forgot to Py_DECREF(item) after PyList_Append(list, item) in a seemingly unrelated piece of code. PyList_SetItem() steals references, PyList_Append() doesn't.