python python-c-api cpython python-internals python-c-extension

Should a C extension fail in module init if a PyModule_Add* function fails?

I was just reviewing some code that created a C extension module for Python which didn't contain enough error-checking. It was easy enough in most cases but when it comes to the module-init function I wasn't sure.

Just for the sake of discussion, let's take the (abriged) module-init function for itertools (yes, the one shipped by CPython):

m = PyModule_Create(&itertoolsmodule);
if (m == NULL)
    return NULL;

for (i=0 ; typelist[i] != NULL ; i++) {
    if (PyType_Ready(typelist[i]) < 0)
        return NULL;
    name = strchr(typelist[i]->tp_name, '.');
    assert (name != NULL);
    Py_INCREF(typelist[i]);
    PyModule_AddObject(m, name+1, (PyObject *)typelist[i]);
}

return m;

It does check if PyModule_Create fails (which is good), then it checks if PyType_Ready fails (which is good) but it doesn't Py_DECREF(m) in that case (which is suprising/confusing) but it totally fails to check if PyModule_AddObject fails. According to it's documentation it can fail:

Add an object to module as name. This is a convenience function which can be used from the module’s initialization function. This steals a reference to value. Return -1 on error, 0 on success.

ok, maybe it seemed like overkill to break the module initialization just in case a type couldn't be added. But even in case they didn't want to abort creating the module completely: it should leak a reference for typelist[i], correct?

Lots of built-in CPython C modules don't do thorough error checking and handling in the module-init function (that's probably why the C extension I'm fixing doesn't have them either) and they are typically very strict with these kind of issues and potential leaks. So my question is basically: Are the error checks important in the module-init function especially when it comes to PyModule_Add* functions (like PyModule_AddObject)? Or can they be omitted like CPython does in many places?

Solution

I'm generally in favour of rigorous error checking when using Python's C API - people very often write long, multistep functions, don't check for any errors, and then act puzzled when it mysteriously fails. In this case (module initialization) you can justify being slightly lax with error checking:

The main reason is that these functions will only really fail because of errors in your C code and they will do this repeatably - it's almost impossible that they will fail unpredictably on an unsuspecting user. Taking PyModule_AddObject as an example, it can fail because:

The first argument passed is not a module (your mistake!)
The object passed is NULL (you should be checking for this earlier)
The module doesn't have a __dict__ (I don't know how this happens, but I can't see it happening by accident to a module that you've just created)
PyDict_SetItemString fails (most likely caused by PyUnicode_FromString failing).

As you point out in the comments, the latter could be caused by a MemoryError (which could happen at any time and is unpredictable). However, when you're getting MemoryErrors from allocating ~10 character strings, it's unlikely that the Python interpreter will manage to keep going much longer.

So I think my conclusion is "if your module seems to be working you probably don't need this error checking, but if things are going wrong then it's useful for finding out where". The one thing I might add is a final check for errors right before you return the module:

if (PyErr_Occurred()) return NULL;
/* or */
if (PyErr_Occurred()) {
    /* print a warning? */
    PyErr_Clear();
    return m;
}

The reason for this is that Python can behave quite oddly if the error indicator is set but you don't return NULL (you'll get exceptions raised at odd times that don't make sense). Therefore, a quick final check has some value.

With respect to reference handling when module initialization fails: it's obviously "best" to do it right, but I think you can justify skipping it. This is code that runs once (so you can't lose large amounts of memory by repeatedly losing small amounts). If an error occurs then the most likely option is that the program aborts (so all memory is recovered). Even if you don't abort, the size of the leak is likely to be pretty small (~100 bytes, realistically).