Assume I have the following class, 'MyClass'.
class MyClass:
def __repr__(self):
return 'Myclass()'
def __str__(self):
return 'Meh'
instances = [MyClass() for i in range(5)]
Some instances are created and stored in the instances
variable. Now, we check its content.
>>> instances
[Myclass(), Myclass(), Myclass(), Myclass(), Myclass()]
To represent the object python calls the __repr__
method. However, when the same instances
variable is passed to a pandas.DataFrame
, the representation of the object changes and the __str__
method seemed to be called.
import pandas as pd
df = pd.DataFrame(data=instances)
>>> df
0
0 Meh
1 Meh
2 Meh
3 Meh
4 Meh
Why has the object's representation changed? Can I determine which representation is used in the DataFrame?
The data is indeed stored as object. It seems pandas just calls the __str__
method (implicitly) when it displays the dataframe.
You can verify that by calling:
df[0].map(type)
It calls type
for each element in the column and returns:
Out[572]:
0 <class '__main__.MyClass'>
1 <class '__main__.MyClass'>
2 <class '__main__.MyClass'>
3 <class '__main__.MyClass'>
4 <class '__main__.MyClass'>
Name: 0, dtype: object
# likewise you get the the
# representation string of the objects
# with:
df[0].map(repr)
Out[578]:
0 Myclass()
1 Myclass()
2 Myclass()
3 Myclass()
4 Myclass()
Name: my_instances, dtype: object
Btw, if you want to create a dataframe with a column that contains the data explicitly, rather use:
df = pd.DataFrame({'my_instances': instances})
This way, you assign a column name.