pythonpandas

How do I use Pandas' infer_objects correctly (v. 2.2.3)


I try the following example in Pandas 2.2.3:

outage_mask = pd.Series(([True]*5 + [False]*5)*5, index=pd.date_range("2025-01-01", freq="1h", periods=50))
[ts for ts in outage_mask.loc[outage_mask.diff().fillna(False)].index]

This gives me the error message:

FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set pd.set_option('future.no_silent_downcasting', True)

I cannot figure out how to correctly apply this infer_objects. I assume the problem is that the output of diff becomes an 'object' dtype due do containing both NaNs and bools, but for example this does not help:

[ts for ts in outage_mask.loc[outage_mask.diff().infer_objects(copy=False).fillna(False)].index]

I can avoid the warning by this clumsy work-around:

[ts for ts in outage_mask.loc[outage_mask.diff().astype(float).fillna(0.).astype(bool)].index]

but I would like to understand how to apply the solution from the warning correctly. How do I do that?


Solution

  • I would use convert_dtypes here, which will force the nullable boolean pandas dtype on a mix of True/False/NaN:

    [
        ts
        for ts in outage_mask.loc[
            outage_mask.diff().convert_dtypes().fillna(False)
        ].index
    ]
    

    You actually don't even need the fillna since a nullable boolean NaN behaves like False and you could skip the list comprehension:

    list(outage_mask.loc[outage_mask.diff().convert_dtypes()].index)
    

    Output:

    [Timestamp('2025-01-01 05:00:00'),
     Timestamp('2025-01-01 10:00:00'),
     Timestamp('2025-01-01 15:00:00'),
     Timestamp('2025-01-01 20:00:00'),
     Timestamp('2025-01-02 01:00:00'),
     Timestamp('2025-01-02 06:00:00'),
     Timestamp('2025-01-02 11:00:00'),
     Timestamp('2025-01-02 16:00:00'),
     Timestamp('2025-01-02 21:00:00')]