My understanding is that the feather
format's advantage is that it preserves types. So I expected that the object
dtype of variable state
would be preserved, but it's not. Why? Is there a way around this?
import sys
import pandas
from pandas import Timestamp
print(pandas.__version__)
## 1.3.4
print(sys.version)
## 3.9.7 (default, Sep 16 2021, 08:50:36)
## [Clang 10.0.0 ]
d = pandas.DataFrame({'Date': {0: Timestamp('2020-12-01 00:00:00'), 1: Timestamp('2020-11-01 00:00:00'), 2: Timestamp('2020-10-01 00:00:00'), 3: Timestamp('2020-09-01 00:00:00'), 4: Timestamp('2020-08-01 00:00:00')}, 'state': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1}, 'value': {0: 3.1, 1: 3.4, 2: 3.9, 3: 5.9, 4: 6.4}})
d.dtypes
# Date datetime64[ns]
# state int64
# value float64
# dtype: object
d["state"] = d["state"].astype(object)
d.dtypes
# Date datetime64[ns]
# state object
# value float64
# dtype: object
d.to_feather("test.feather")
d = pandas.read_feather("test.feather")
d.dtypes
# Date datetime64[ns]
# state int64
# value float64
# dtype: object
I want state
to be a "string" or "object", but not an "int64". I don't want to have to recast every time I load the dataframe. Thanks!
A while back Quang Hoang suggested in the comments that the following works:
d["state"] = d["state"].astype(str)
I have no explanation to offer. I'll be happy to select any other, better answer.