pythonpandasoverriding

Dot notation access in pd.Series() but with priority on elements in the index


My aim is to implement a class that inherits from pd.Series class and acts as a container object. This object will hold varied objects as its elements in the form of

container = Container(
   a = 12,
   b = [12, 10, 20],
   c = 'string',
   name='this value',
)

I will be accessing these elements via a dot notation:

print(container.a)     # Output: 12
print(container.b)     # Output: [12, 10, 20]
print(container.c)     # Output: string
print(container.name)  # Output: None

I have tried the following implementation:

import pandas as pd

class Container(pd.Series):
    def __init__(self, **kwargs):
        super().__init__(data=kwargs)

This works to a large extent.

However, there would be special cases. If index of one of the elements in the container is also a property in pd.Series() then the property would be returned. In the case of above example, container.name would return None value as it returns the pd.Series().name property of the series. I want container.name to return 'this value'.

So, maybe I need to overerite the __getattribute__ or __getattr__ to make this work.

I have tried the following:

def __getattr__(self, item):
    if item in self:
        return self[item]
    else:
        raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{item}'")

def __getattribute__(self, item):
    try:
        # Try to get an item using dot notation when it's available in the series itself
        value = super().__getattribute__('__getitem__')(item)
        if item in self:
            return value
    except (KeyError, AttributeError):
        pass
    # Fallback to the default __getattribute__
    return super().__getattribute__(item)

However, this would always return RecursionError: maximum recursion depth exceeded. I have tried different ways.

Note that I like to inherit from pd.Series() to have access to all other functionalities of the series.


Solution

  • First, check if item exists in self using super().__contains__(item). If it exists, return self[item] directly. Otherwise, fall back to the default __getattribute__.

    Here is the solution:

    import pandas as pd
    
    class Container(pd.Series):
        def __init__(self, **kwargs):
            object.__setattr__(self, "_custom_attrs", kwargs)
            super().__init__(kwargs)
    
        def __getattr__(self, item):
            if item in self._custom_attrs:
                return self._custom_attrs[item]
            if item in self.index:
                return super().__getitem__(item)
            raise AttributeError(f"'{self.__class__.__name__}' object has no attribute '{item}'")
    
        def __getattribute__(self, item):
            if item in {"_custom_attrs", "index", "dtype", "_mgr", "_data"}:
                return object.__getattribute__(self, item)
            
            custom_attrs = object.__getattribute__(self, "_custom_attrs")
            if item in custom_attrs:
                return custom_attrs[item]
    
            return pd.Series.__getattribute__(self, item)
    
    container = Container(
        a=12,
        b=[12, 10, 20],
        c='string',
        name='this value'
    )
    
    print(container.a)     
    print(container.b)     
    print(container.c)     
    print(container.name)