pythonpandaspython-importclass-extensions

Is it possible to extend a class from __init__.py


I have written a small python package which extends a pandas dataframe with a few additional methods.

At the moment, I have this code in my package:

def init():
    @pd.api.extensions.register_dataframe_accessor("test")
    class _:
        def __init__(self, pandas_obj):
            self._obj = pandas_obj

        def myMethod(self):
            pass

I then do the following in python:

import pandas as pd
import mypackage as mp
mp.init()
test = pd.Dataframe(<define data frame>)
test.mp.myMethod()

My question is, is it possible to do the pandas import and register the accessor from within the __init__.py in mypackage, so that once mypackage is imported, I automatically have access to mymethod without the init() step? My current approach feels a bit clunky...


Solution

  • I might be missing something in your question, but I think you might be barking up the wrong tree. There's nothing special about __init__.py in this regard--anything you write in __init__.py is executed when you import the package, so I don't think you need that init() function at all. If you have a file containing:

    # mypackage/__init__.py
    import pandas as pd
    
    
    @pd.api.extensions.register_dataframe_accessor("test")
    class _:
        def __init__(self, pandas_obj):
            self._obj = pandas_obj
    
        def myMethod(self):
            print(self._obj)
    

    Now you can just use it by importing mypackage like:

    >>> import pandas as pd
    >>> import mypackage
    >>> df = pd.DataFrame({'a': [1, 2, 3]})
    >>> df.test.myMethod()
       a
    0  1
    1  2
    2  3
    

    As an aside, one reason you might explicitly want something like your init() function is principle of least surprise: Since register_dataframe_accessor modifies the namespace of DataFrame instances for all users (including other libraries) there is a small possibility that your register_dataframe_accessor, just by importing your package, might override some other package's dataframe acccessor if they happen to share the same name.

    If the name is reasonably unique this may not be a problem though. It also may simply not be a problem for your package depending on how it's intended to be used.