pythonpandasdataframebooleannullable

How to set pandas.DataFrame cell to null without FutureWarning


I would like to set some cells to null based on a condition. For example:

import pandas as pd # version is 2.2.2
df = pd.DataFrame({'x' : [1, 2, 2, 1, 1, 2]})
df["b"]=False
df.loc[df["x"]==1,"b"]=pd.NA

It works but I get a

FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value 'nan' has dtype incompatible with bool, please explicitly cast to a compatible dtype first.

I tried reading the documentation and looking at examples, but could not find a solution. What is the correct way to do this?


Solution

  • By defining b with df['b'] = False, you set the Series/column's dtype to bool, and since pd.NA is not a bool it cannot be inserted safely in the column, which raises the warning (this will be an error in the future).

    You could initialize the column as object:

    import numpy as np
    
    df['b'] = np.array(False, dtype='object')
    
    df.loc[df['x']==1, 'b'] = pd.NA
    

    Then df['b'].dtype is dtype('O') (object).

    Or, better, as nullable boolean:

    df['b'] = pd.Series(False, index=df.index, dtype='boolean')
    
    df.loc[df['x']==1, 'b'] = pd.NA
    

    Note that you could also first initialize a nullable boolean column of <NA>s, then assign False where df['x']!=1:

    df['b'] = pd.Series(dtype='boolean')
    
    df.loc[df['x']!=1, 'b'] = False
    

    Now df['b'].dtype is BooleanDtype (nullable boolean).

    Output:

       x      b
    0  1   <NA>
    1  2  False
    2  2  False
    3  1   <NA>
    4  1   <NA>
    5  2  False