pythonoverloadingmagic-methodspython-class

Python overload __setitem__ does not replace self out of the class range


I am trying to create a kind of dataframe subclass with inheritance of polars.DataFrame. I would like to modify the __setitem__ method to make following statement possible:

df['test_column'] = 'test'

However, when I overload the __setitem__, it seems like if the method does not modify the item outside of the class.

It is a simple case just in order to see how I can do this. So that, i did this:

class CustomDataFrame(pl.DataFrame):
    def __setitem__(self, key, value):
        if isinstance(key, str) and isinstance(value, str):
            self = self.with_columns(pl.lit(value).alias(key))
            print(self)
        else:
            super().__setitem__(key, value)

Then I test my code with that:

data = {'test_column': [1, 2, 3, 4, 5], 'test_column1':[1,3,3,4,5]}
df = CustomDataFrame(data)
df['test_column'] = 'test'
print(df)

Problem is that the print(df) does not print the modified dataframe but the "initial" one. Therefore, the print(self) inside the __setitem__ method print the modified dataframe.

It is like if the range of the modification of the self did not exceed the inside of the class. Somebody has an idea of why my code does not work please?


Solution

  • You can not overwrite self inside a function, this will only bind whatever you have to a local variable named self and this will be deleted after the function is finished. You do not change your object in any way.

    If you want to overwrite it you can try something proposed in this answer.

    Following the answer I came up with this code. I did not change the __class__ since this should still be your CustomDataFrame and not suddenly the polars one. Assigning the __dict__ seems to work at least judging from the prints.

    It is still quite dangerous imo. As long as your class stays simple like this it might be fine, once your class would also add variables to self, this will break. No idea if there are any more side effects by doing this.

    You should probably think if you really need this and if you really need it this way, instead of e.g. creating a function which you call when you need this special case you are trying to cover in your __setitems__ currently.

    import polars as pl
    
    
    class CustomDataFrame(pl.DataFrame):
        def __setitem__(self, key, value):
            if isinstance(key, str) and isinstance(value, str):
                tmp = self.with_columns(pl.lit(value).alias(key))
                self.__dict__ = tmp.__dict__
                print(self)
            else:
                super().__setitem__(key, value)
    
    
    data = {'test_column': [1, 2, 3, 4, 5], 'test_column1':[1,3,3,4,5]}
    df = CustomDataFrame(data)
    df['test_column'] = 'test'
    print(df.__class__, df)