pythonarraysnumpyfilternan

How to remove all rows in a numpy.ndarray that contain non-numeric values


I read in a dataset as a numpy.ndarray and some of the values are missing (either by just not being there, being NaN, or by being a string written "NA").

I want to clean out all rows containing any entry like this. How do I do that with a numpy ndarray?


Solution

  • >>> a = np.array([[1,2,3], [4,5,np.nan], [7,8,9]])
    array([[  1.,   2.,   3.],
           [  4.,   5.,  nan],
           [  7.,   8.,   9.]])
    
    >>> a[~np.isnan(a).any(axis=1)]
    array([[ 1.,  2.,  3.],
           [ 7.,  8.,  9.]])
    

    and reassign this to a.

    Explanation: np.isnan(a) returns a similar array with True where NaN, False elsewhere. .any(axis=1) reduces an m*n array to n with an logical or operation on the whole rows, ~ inverts True/False and a[ ] chooses just the rows from the original array, which have True within the brackets.