pythonpandasdataframevalue-restriction

Set the values out of the defined set to a given value (f.e. NaN) for a column in pandas data frame


Having a defined set of valid values, all the pandas data frame column values out of it should be set to a given value, f.e. NaN. The values contained in the set and data frame can be assumed to be of numerical type.

Having the following valid values set and data frame:

valid = {5, 22}
df = pd.DataFrame({'a': [5, 1, 7, 22],'b': [12, 3 , 10, 9]})

    a   b
0   5  12
1   1   3
2   7  10
3  22   9

Setting the valid values on column a would result in:

     a   b
0    5  12
1  NaN   3
2  NaN  10
3   22   9

Solution

  • You can use pd.Series.where:

    df['a'].where(df['a'].isin(valid), inplace=True)
    
    print(df)
    
          a   b
    0   5.0  12
    1   NaN   3
    2   NaN  10
    3  22.0   9
    

    A few points to note: