pythoncpythonpython-c-api

Why do attributes that are being set by a custom type initializer need to be protected?


The CPython Tutorial defines a custom initializer for a custom type which has the following lines :

if (first) {
    tmp = self->first;
    Py_INCREF(first);
    self->first = first;
    Py_XDECREF(tmp);
}

but the tutorial advises against this simpler but more nefarious version :

if (first) {
    Py_XDECREF(self->first);
    Py_INCREF(first);
    self->first = first;
}

For the following reason :

But this would be risky. Our type doesn’t restrict the type of the first member, so it could be any kind of object. It could have a destructor that causes code to be executed that tries to access the first member; or that destructor could release the Global interpreter Lock and let arbitrary code run in other threads that accesses and modifies our object.

I understand the reason why a multithreaded programming can break the simple version. There can be a race condition where the new self->first is set by one thread and freed by another thread (by way of Py_XDECREF). This is avoided in the correct version because both threads will have had to set self->first before freeing it.

The part I don't understand is this section :

It could have a destructor that causes code to be executed that tries to access the first member

If first is an object that is getting garbage collected, its destructor would have to be called and it should have access to itself through the duration of the destructor. Why can this be dangerous?

Thank you!


Solution

  • Let's suppose we have:

    custom = Custom(...)  # global variable
    
    class SomePyClass:
        def __del__(self):
            # access global variable
            custom.__init__(1, 2, 3)
    

    Let's suppose that custom.first is an instance of SomePyClass and that this instance gets destructed at the line Py_XDECREF(self->first);.

    The arbitrary code in the destructor is going to cause custom.__init__ to be called again. This will cause Py_XDECREF to be called again on an object that really should have a refcount of 0 and which is already in the process of getting destroyed. That alone represents a reference counting error since it's ref-count will end up below 0 (note this may not be exactly what happens since it temporarily gets given a ref-count of 1 for the duration of the destructor, but it absolutely is a ref-counting error).

    Note also the new value of first that is being assigned to in __init__ is also mis-reference-counted: while our instance is decrefed twice, the value that is about to be assigned is replaced without decrefing it.