pythonpandasscipysparse-matrixmodin

Modin AttributeError when importing from sparse matrix


I am trying to use Modin package to import a sparse matrix created with scipy (specifically, a scipy.sparse.csr_matrix).

Invoking the method:

from modin import pandas as pd
pd.DataFrame.sparse.from_spmatrix(mat)

I am getting the following AttributeError:

AttributeError                            Traceback (most recent call last)
C:\Users\BERGAM~1\AppData\Local\Temp/ipykernel_37436/3032405809.py in <module>
----> 1 pd.DataFrame.sparse.from_spmatrix(mat)

C:\Miniconda3\envs\persolite_v0\lib\site-packages\modin\pandas\accessor.py in from_spmatrix(cls, data, index, columns)
    109     @classmethod
    110     def from_spmatrix(cls, data, index=None, columns=None):
--> 111         return cls._default_to_pandas(
    112             pandas.DataFrame.sparse.from_spmatrix, data, index=index, columns=columns
    113         )

C:\Miniconda3\envs\persolite_v0\lib\site-packages\modin\pandas\accessor.py in _default_to_pandas(self, op, *args, **kwargs)
     78             Result of operation.
     79         """
---> 80         return self._parent._default_to_pandas(
     81             lambda parent: op(parent.sparse, *args, **kwargs)
     82         )

AttributeError: 'function' object has no attribute '_parent'

While using the original pandas API, it works.

Anyone with a similar problem? Thanks for the support


Solution

  • This is a bug. The code in this package uses a classmethod to call an instance method, and as a result the self reference is not bound to the inference, but is instead a reference to the first argument (which here is a function).

    This is the code that fails:

    class BaseSparseAccessor:
        
        def _default_to_pandas(self, op, *args, **kwargs):
            return self._parent._default_to_pandas(
                lambda parent: op(parent.sparse, *args, **kwargs)
            )
    
    class SparseFrameAccessor(BaseSparseAccessor):
    
        @classmethod
        def from_spmatrix(cls, data, index=None, columns=None):
            return cls._default_to_pandas(
                pandas.DataFrame.sparse.from_spmatrix, data, index=index, columns=columns
            )
    

    A quick example of why this fails follows:

    class A:
        
        _parent = 0
        
        def a_method(self, op, **args):
            self._parent = op(self._parent, **args)
    
    class B(A):
        
        @classmethod
        def b_method(cls, data, **args):
            return cls.a_method(sum, data, **args)
    

    When you call b_method (it doesn't matter if B is instantiated into an instance or not) it will fail, because self in a_method is the function sum instead of the class or instance reference.

    >>> B.b_method(20)
    
    AttributeError                            Traceback (most recent call last)
    <ipython-input-17-3914ce57d001> in <module>
    ----> 1 B.b_method(20)
    
    <ipython-input-11-a25ce2c0614c> in b_method(cls, data, **args)
         12     @classmethod
         13     def b_method(cls, data, **args):
    ---> 14         return cls.a_method(sum, data, **args)
    
    <ipython-input-11-a25ce2c0614c> in a_method(self, op, **args)
          6 
          7     def a_method(self, op, **args):
    ----> 8         self._parent = op(self._parent, **args)
          9 
         10 class B(A):
    
    AttributeError: 'builtin_function_or_method' object has no attribute '_parent'