One can do that with dataclass
es like so:
from dataclasses import dataclass
import pandas as pd
@dataclass
class MyDataClass:
i: int
s: str
df = pd.DataFrame([MyDataClass("a", 1), MyDataClass("b", 2)])
that makes the DataFrame
df
with columns i
and s
as one would expect.
Is there an easy way to do that with an attrs
class?
I can do it by iterating over the the object's properties and constructing an object of a type like dict[str, list]
({"i": [1, 2], "s": ["a", "b"]}
in this case) and constructing the DataFrame from that but it would be nice to have support for attrs
objects directly.
You can access the dictionary at the heart of a dataclass like so
a = MyDataClass("a", 1)
a.__dict__
this outputs:
{'i': 'a', 's': 1}
Knowing this, if you have an iterable arr
of type MyDataClass
, you can access the __dict__
attribute and construct a dataframe
arr = [MyDataClass("a", 1), MyDataClass("b", 2)]
df = pd.DataFrame([x.__dict__ for x in arr])
df outputs:
i s
0 a 1
1 b 2
The limitation with this approach that if the slots
option is used, then this will not work.
Alternatively, it is possible to convert the data from a dataclass to a tuple or dictionary using dataclasses.astuple
and dataclasses.asdict
respectively.
The data frame can be also constructed using either of the following:
# using astuple
df = pd.DataFrame(
[dataclasses.astuple(x) for x in arr],
columns=[f.name for f in dataclasses.fields(MyDataClass)]
)
# using asdict
df = pd.DataFrame([dataclasses.asdict(x) for x in arr])