I am used to being able to do things like:
import pandas as pd
df = pd.DataFrame( pd.Categorical(['a','b','b'],['a','b']),columns=['x'])
df.loc[:,'x'] = df['x'].replace({'a':1, 'b':2})
However, with newer pandas, it throws a warning:
/tmp/ipykernel_1721527/1018712932.py:4: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise in a future error of pandas. Value '[1, 2, 2]
Categories (2, object): [1, 2]' has dtype incompatible with category, please explicitly cast to a compatible dtype first.
df.loc[:,'x'] = df['x'].replace({'a':1, 'b':2})
Shortest workaround I can think of is:
ncol = df['x'].replace({'a':1, 'b':2}).astype('float')
df['x'] = None
df = df.astype({'x':'float'})
df.loc[:,'x'] = ncol
But this seems way too long and unelegant for what is ostensibly a very simple operation. Am I missing something obvious?
Ironically, the first part of your question was asked just a few minutes ago. You should not use a slice (df.loc[:, 'x']
) but rather recreate the column (df['x']
) in your assignment when changing the dtype (changing categories changes the dtype).
The second part requires to use cat.rename_categories
instead of replace
since categories are immutable in a Categorical Series, or map
if you change all the values and do not want a Categorical:
df['x'] = df['x'].cat.rename_categories({'a':1, 'b':2})
# or with map
df['x'] = df['x'].map({'a':1, 'b':2})
Output:
x
0 1
1 2
2 2
Demonstration the the dtypes are different when categories change:
df = pd.DataFrame(pd.Categorical(['a','b','b'], ['a','b']),columns=['x'])
df['x'].dtype
# CategoricalDtype(categories=['a', 'b'], ordered=False, categories_dtype=object)
df['x'].cat.rename_categories({'a':1, 'b':2}).dtype
# CategoricalDtype(categories=[1, 2], ordered=False, categories_dtype=int64)
df['x'].dtype == df['x'].cat.rename_categories({'a':1, 'b':2}).dtype
# False
df['x'].dtype == df['x'].cat.rename_categories({'a':'1', 'b':'2'}).dtype
# False
df['x'].dtype == df['x'].copy().dtype
# True