I'm working with both R and Python and I want to write one of my pandas DataFrames as a feather so I can work with it more easily in R. However, when I try to write it as a feather, I get the following error:
ArrowInvalid: trying to convert NumPy type float64 but got float32
I doubled checked my column types and they are already float 64:
In[1]
df.dtypes
Out[1]
id Object
cluster int64
vector_x float64
vector_y float64
I get the same error regardless of using feather.write_dataframe(df, "path/df.feather")
or df.to_feather("path/df.feather")
.
I saw this on GitHub but didn't understand if it was related or not: https://issues.apache.org/jira/browse/ARROW-1345 and https://github.com/apache/arrow/issues/1430
In the end, I can just save it as a csv and change the columns in R (or just do the whole analysis in Python), but I was hoping to use this.
Edit 1:
Still having the same issue despite the great advice below so updating what I've tried.
df[['vector_x', 'vector_y', 'cluster']] = df[['vector_x', 'vector_y', 'cluster']].astype(float)
df[['doc_id', 'text']] = df[['doc_id', 'text']].astype(str)
df[['doc_vector', 'doc_vectors_2d']] = df[['doc_vector', 'doc_vectors_2d']].astype(list)
df.dtypes
Out[1]:
doc_id object
text object
doc_vector object
cluster float64
doc_vectors_2d object
vector_x float64
vector_y float64
dtype: object
Edit 2:
After much searching, it appears that the issue is that my cluster column is a list type made up of int64 integers. So I guess the real quest is, does feather format support lists?
Edit 3:
Just to tie this in a bow, feather does not support nested data types like lists, at least not yet.
- Luckly, I got the reason of my feather IO error here.
- And I also got the solution for this problem, pandas.to_feather and read_feather are both based on pyarrow, and a column that contains lists as values is already support by pyarrow from 2019.
Solution:
pip install pyarrow==latest # my version is 1.0.0 and it work
Then, still use pd.to_feather("Filename") and read_feather.