pythonpandaspython-attrs

Is there an easy way to construct a pandas DataFrame from an Iterable of dataclass or attrs objects?


One can do that with dataclasses like so:

from dataclasses import dataclass
import pandas as pd

@dataclass
class MyDataClass:
    i: int
    s: str


df = pd.DataFrame([MyDataClass("a", 1), MyDataClass("b", 2)])

that makes the DataFrame df with columns i and s as one would expect.

Is there an easy way to do that with an attrs class?

I can do it by iterating over the the object's properties and constructing an object of a type like dict[str, list] ({"i": [1, 2], "s": ["a", "b"]} in this case) and constructing the DataFrame from that but it would be nice to have support for attrs objects directly.


Solution

  • You can access the dictionary at the heart of a dataclass like so

    a = MyDataClass("a", 1)
    a.__dict__
    

    this outputs:

    {'i': 'a', 's': 1}
    

    Knowing this, if you have an iterable arr of type MyDataClass, you can access the __dict__ attribute and construct a dataframe

    arr = [MyDataClass("a", 1), MyDataClass("b", 2)]
    df = pd.DataFrame([x.__dict__ for x in arr])
    

    df outputs:

       i  s
    0  a  1
    1  b  2
    

    The limitation with this approach that if the slots option is used, then this will not work.

    Alternatively, it is possible to convert the data from a dataclass to a tuple or dictionary using dataclasses.astuple and dataclasses.asdict respectively.

    The data frame can be also constructed using either of the following:

    # using astuple
    df = pd.DataFrame(
      [dataclasses.astuple(x) for x in arr], 
      columns=[f.name for f in dataclasses.fields(MyDataClass)]
    )
    
    # using asdict
    df = pd.DataFrame([dataclasses.asdict(x) for x in arr])