pythonpandas

Pandas: IndexingError: Unalignable boolean Series provided as indexer


I'm trying to run what I think is simple code to eliminate any columns with all NaNs, but can't get this to work (axis = 1 works just fine when eliminating rows):

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})

df = df[df.notnull().any(axis = 0)]

print df

Full error:

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Expected output:

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

Solution

  • You need loc, because filter by columns:

    print (df.notnull().any(axis = 0))
    a     True
    b     True
    c     True
    d    False
    dtype: bool
    
    df = df.loc[:, df.notnull().any(axis = 0)]
    print (df)
    
         a    b    c
    0  1.0  4.0  NaN
    1  2.0  NaN  8.0
    2  NaN  6.0  9.0
    3  NaN  NaN  NaN
    

    Or filter columns and then select by []:

    print (df.columns[df.notnull().any(axis = 0)])
    Index(['a', 'b', 'c'], dtype='object')
    
    df = df[df.columns[df.notnull().any(axis = 0)]]
    print (df)
    
         a    b    c
    0  1.0  4.0  NaN
    1  2.0  NaN  8.0
    2  NaN  6.0  9.0
    3  NaN  NaN  NaN
    

    Or dropna with parameter how='all' for remove all columns filled by NaNs only:

    print (df.dropna(axis=1, how='all'))
         a    b    c
    0  1.0  4.0  NaN
    1  2.0  NaN  8.0
    2  NaN  6.0  9.0
    3  NaN  NaN  NaN