pythonperformancepython-internals

Performance impact of inheriting from many classes


I am investigating the performance impact of a very broad inheritance setup.

  1. Start with 260 distinct attribute names, from a0 through z9.
  2. Create 260 classes with 1 uniquely-named attribute each. Create one class that inherits from those 260 classes.
  3. Create 130 classes with 2 uniquely-named attributes each. Create one class that inherits from those 130 classes.
  4. Repeat for 52 classes with 5 attributes each, 26 classes with 10 attributes each, and 1 class with all 260 attributes.
  5. Create one instance of each of the five classes, and then measure the time to read (and add together) all 260 attributes on each.

Average performance from 2.5M reads, interleaving in different orders.

 From260: 2.48
 From130: 1.55
  From52: 1.22
  From26: 1.15
AllInOne: 1.00

These values sort of fit on a linear regression...but they don't. And these relationships hold true across many different runs, of various sizes and test orders.

linear relationship

The values come closer to fitting a second-degree polynomial, or exponential fit...but again, the data does not fit so cleanly as to be obvious.

enter image description here

As I massively increase the number of subclasses, will the performance falloff be linear, or non-linear?


Here's some updated code that samples many different subclass combinations up to 2310:

from time import time

TOTAL_ATTRS = 2310

# Create attribute names "a0000" through "a2309"
attr_names = [f"a{i:04d}" for i in range(TOTAL_ATTRS)]

# Map each attribute to a default value (1..2310)
all_defaults = {name: i + 1 for i, name in enumerate(attr_names)}

# The provided factors of 2310
factors = [1, 2, 3, 5, 6, 7, 10, 11, 14, 15, 21, 22, 30, 33, 35, 42,
           55, 66, 70, 77, 105, 110, 154, 165, 210, 231, 330, 385,
           462, 770, 1155, 2310]

# Build a dictionary mapping each factor to a composite class.
# For factor f, create f subclasses each with (2310 // f) attributes,
# then create a composite class inheriting from all f subclasses.
composite_classes = {}
for f in factors:
    group_size = TOTAL_ATTRS // f
    subclasses = []
    for i in range(f):
        group = attr_names[i * group_size:(i + 1) * group_size]
        group_defaults = {name: all_defaults[name] for name in group}
        subclass = type(f"Sub_{f}_{i}", (object,), group_defaults)
        subclasses.append(subclass)
    composite_classes[f] = type(f"From_{f}", tuple(subclasses), {})

iterations = range(0, 1_000)

for n, c in composite_classes.items():
    i = c()
    t = time()
    for _ in iterations:
        for a in attr_names:
            getattr(i, a)
    print(f"Inheriting from {n} subclasses: {time()-t:.3f}s")

and the results, which seem far more linear than polynomial, but which have odd "ledges" in them:

data points with linear fitting

data points with quadratic fitting


Solution

  • The slowdown will be weird and hard to fully predict, depending on subtle details of memory allocation and attribute access order.

    Worst-case, not only will you experience a linear slowdown, you'll slow down completely unrelated attribute accesses in unrelated code.


    CPython has a 4096-entry type attribute cache, and instance attribute lookups do check this cache when looking for an attribute through the class. The cache entry used for an attribute lookup is determined using a simple hash based on the type's version tag and the address of the string object for the attribute being looked up:

    #define MCACHE_HASH(version, name_hash)                                 \
            (((unsigned int)(version) ^ (unsigned int)(name_hash))          \
             & ((1 << MCACHE_SIZE_EXP) - 1))
    
    #define MCACHE_HASH_METHOD(type, name)                                  \
        MCACHE_HASH(FT_ATOMIC_LOAD_UINT32_RELAXED((type)->tp_version_tag),   \
                    ((Py_ssize_t)(name)) >> 3)
    

    If an attribute is found through the cache, the lookup is quick, and doesn't depend on the depth of the inheritance hierarchy. But if an attribute is not found through the cache, the lookup has to go through the class's MRO, one class at a time, performing a dict lookup in each class until it finds (or fails to find) the attribute. This takes an amount of time linear in the number of classes it has to look through.

    Note that because of the descriptor protocol, Python has to do this even if it finds an entry for the attribute directly in the instance dict.


    So the more attributes you use, the greater the chance you run into a hash collision, either with another one of your attributes or with something completely unrelated, like list.append. The longer your MRO, the greater the impact of a collision.

    If two attributes happen to produce a hash collision, then accessing one while the other is in the cache will need to perform a slow MRO search for the attribute. Then it'll evict the other attribute from the cache. If you access this attribute again before the other one, it'll be quick, but the next time you access the attribute that just got evicted, you'll go through another slow MRO search and eviction.

    Because the cache cares about the address of the attribute string instead of the actual attribute name, using a different string object for a lookup will also cause a cache miss. In normal Python code, attribute names get interned, so this isn't a problem, but when you're generating attribute names dynamically, the interning doesn't happen.