pythonpandasdataframenan

How to find which columns contain any NaN value in Pandas dataframe


Given a pandas dataframe containing possible NaN values scattered here and there:

Question: How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?


Solution

  • UPDATE: using Pandas 0.22.0

    Newer Pandas versions have new methods 'DataFrame.isna()' and 'DataFrame.notna()'

    In [71]: df
    Out[71]:
         a    b  c
    0  NaN  7.0  0
    1  0.0  NaN  4
    2  2.0  NaN  4
    3  1.0  7.0  0
    4  1.0  3.0  9
    5  7.0  4.0  9
    6  2.0  6.0  9
    7  9.0  6.0  4
    8  3.0  0.0  9
    9  9.0  0.0  1
    
    In [72]: df.isna().any()
    Out[72]:
    a     True
    b     True
    c    False
    dtype: bool
    

    as list of columns:

    In [74]: df.columns[df.isna().any()].tolist()
    Out[74]: ['a', 'b']
    

    to select those columns (containing at least one NaN value):

    In [73]: df.loc[:, df.isna().any()]
    Out[73]:
         a    b
    0  NaN  7.0
    1  0.0  NaN
    2  2.0  NaN
    3  1.0  7.0
    4  1.0  3.0
    5  7.0  4.0
    6  2.0  6.0
    7  9.0  6.0
    8  3.0  0.0
    9  9.0  0.0
    

    OLD answer:

    Try to use isnull():

    In [97]: df
    Out[97]:
         a    b  c
    0  NaN  7.0  0
    1  0.0  NaN  4
    2  2.0  NaN  4
    3  1.0  7.0  0
    4  1.0  3.0  9
    5  7.0  4.0  9
    6  2.0  6.0  9
    7  9.0  6.0  4
    8  3.0  0.0  9
    9  9.0  0.0  1
    
    In [98]: pd.isnull(df).sum() > 0
    Out[98]:
    a     True
    b     True
    c    False
    dtype: bool
    

    or as @root proposed clearer version:

    In [5]: df.isnull().any()
    Out[5]:
    a     True
    b     True
    c    False
    dtype: bool
    
    In [7]: df.columns[df.isnull().any()].tolist()
    Out[7]: ['a', 'b']
    

    to select a subset - all columns containing at least one NaN value:

    In [31]: df.loc[:, df.isnull().any()]
    Out[31]:
         a    b
    0  NaN  7.0
    1  0.0  NaN
    2  2.0  NaN
    3  1.0  7.0
    4  1.0  3.0
    5  7.0  4.0
    6  2.0  6.0
    7  9.0  6.0
    8  3.0  0.0
    9  9.0  0.0