pythonpandastypesreplace

Pandas replace and downcasting deprecation since version 2.2.0


Replacing strings by numerical values used to be easy, but since pandas 2.2. the simple approach below throws a warning. What is the "correct" way to do this now?

>>> s = pd.Series(["some", "none", "all", "some"])
>>> s.dtypes
dtype('O')

>>> s.replace({"none": 0, "some": 1, "all": 2})
FutureWarning: Downcasting behavior in `replace` is deprecated and will be 
removed in a future version. To retain the old behavior, explicitly call
`result.infer_objects(copy=False)`. To opt-in to the future behavior, set
`pd.set_option('future.no_silent_downcasting', True)`
0    1
1    0
2    2
3    1
dtype: int64

If I understand the warning correctly, the object dtype is "downcast" to int64. Perhaps pandas wants me to do this explicitly, but I don't see how I could downcast a string to a numerical type before the replacement happens.


Solution

  • When you run:

    s.replace({"none": 0, "some": 1, "all": 2})
    

    The dtype of the output is currently int64, as pandas inferred that the values are all integers.

    print(s.replace({"none": 0, "some": 1, "all": 2}).dtype) # int64
    

    In a future pandas version this won't happens anymore automatically, the dtype will remain object (you will still have integers but as objects, not int64):

    pd.set_option('future.no_silent_downcasting', True)
    print(s.replace({"none": 0, "some": 1, "all": 2}).dtype) # object
    

    You will have to explicitly downcast the objects to integers (after the replacement):

    s.replace({"none": 0, "some": 1, "all": 2}).infer_objects(copy=False)
    
    print(s.replace({"none": 0, "some": 1, "all": 2})
           .infer_objects(copy=False).dtype)            # int64