pythonpython-3.xperformancemagic-methods

The cost of rewrite __setattr__ () was too high


I want to save the time and mark the object as modified, so I wrote a class and override its __setattr__ function.

import time

class CacheObject(object):
    __slots__ = ('modified', 'lastAccess')
    def __init__(self):
        object.__setattr__(self,'modified',False)
        object.__setattr__(self,'lastAccess',time.time())
        
    def setModified(self):
        object.__setattr__(self,'modified',True)
        object.__setattr__(self,'lastAccess',time.time())
        
    def resetTime(self):
        object.__setattr__(self,'lastAccess',time.time())
        
    def __setattr__(self,name,value):
        if (not hasattr(self,name)) or object.__getattribute__(self,name)!=value: 
            object.__setattr__(self,name,value)
            self.setModified()
        
class example(CacheObject):
    __slots__ = ('abc',)
    def __init__(self,i):
        self.abc = i
        super(example,self).__init__()

t = time.time()
f = example(0)
for i in range(100000):
    f.abc = i
    
print(time.time()-t)

I measured the process time, and it took 2 seconds. When I commented out overridden function, the process time was 0.1 second, I know the overridden function would be slower but almost 20 times the gap is too much. I think I must get something wrong.


Taking the suggestion from cfi:

  1. Eliminate the if condition

        def __setattr__(self,name,value):
    #        if (not hasattr(self,name)) or object.__getattribute__(self,name)!=value: 
                object.__setattr__(self,name,value)
                self.setModified()
    

    This takes the running time down to 1.9s, a little improvement, but marking the object modified if it's not changed would cost more in other process, so not an option.

  2. Change self.func to classname.func(self)

    def __setattr__(self,name,value):
        if (not hasattr(self,name)) or object.__getattribute__(self,name)!=value: 
            object.__setattr__(self,name,value)
            CacheObject.setModified(self)
    

    Runtime is 2.0s, so nothing really changed

  3. Extract setmodified function

    def __setattr__(self,name,value):
        if (not hasattr(self,name)) or object.__getattribute__(self,name)!=value: 
            object.__setattr__(self,name,value)
            object.__setattr__(self,'modified',True)
            object.__setattr__(self,'lastAccess',time.time())
    

    This pushes the runtime down to 1.2s! That's great, and it does save almost 50% time, though the cost is still high.


Solution

  • Not a complete answer but some suggestions:

    1. Can you eliminate the value comparison? Of course that's a feature change of your implementation. But the overhead in runtime will be even worse if more complex objects than integers are being stored in attributes.

    2. Every call to a method via self needs to go through full method resolution order checking. I don't know if Python could do any MRO caching itself. Probably not because of the types-being-dynamic principle. Thus, you should be able to reduce some overhead by changing any self.method(args) to classname.method(self, args). That removes the MRO overhead from the calls. This applies to self.setModified() in your settattr() implementation. In most places you have done this already with references to object.

    3. Every single function call takes time. You could eliminate them and e.g. move setModified's functionality into __setattr__ itself.

    Let us know how the timing changes for each of these. I'd split out the experiment.

    Edit: Thanks for the timing numbers.

    The overhead may seem drastic (still a factor of 10 it seems). However put that into perspective of overall runtime. In other words: How much of you overall runtime will be spent in setting those tracked attributes and how much time is spent elsewhere?

    In a single-thread application Amdahl's Law is a simple rule to set expectations straight. An illustration: If 1/3 of the time is spend setting attributes, and 2/3 doing other stuff. Then slowing down the attribute setting by 10x will only slow down the 30%. The smaller the percentage of time spent with the attributes, the less we have to care. But this may not help you at all if your percentage is high...