pythonpandasdataframe

How to filter and drop rows in a pandas dataframe from selected columns


I've got a pandas dataframe that has 100 columns. First col is a string and the rest are ints. I'd like to drop any row where an int column contains a value that is out of range (less than 0 or greater than 500)

I've seen examples of doing this operation on single columns but not multiple. I've tried the following where I've filled a list with all 99 column names that are ints but the index list returned contains the index of every row so they all get dropped.

drop_list = my_data[(my_data[column_name_list] < 0) | (my_data[column_name_list] > 500)].index
my_data = my_data.drop(drop_list)

Solution

  • If I understand correctly, you should aggregate with any/all to perform boolean indexing:

    out = my_data[~(my_data[column_name_list].lt(0).any(axis=1)
                   |my_data[column_name_list].gt(500).any(axis=1)
                   )]
    

    Example:

    # input
       A  0   1    2    3
    0  a -1  10  100   10
    1  b  2  20  200   20
    2  c  3  30  300  600
    
    # output
       A  0   1    2   3
    1  b  2  20  200  20
    
    # intermediates
       A  0   1    2    3  lt(0).any(axis=1)  gt(500).any(axis=1)     OR      ~
    0  a -1  10  100   10               True                False   True  False
    1  b  2  20  200   20              False                False  False   True
    2  c  3  30  300  600              False                 True   True  False
    

    Note that following De Morgan's law this is equivalent to:

    out = my_data[(my_data[column_name_list].ge(0).all(axis=1)
                 & my_data[column_name_list].le(500).all(axis=1)
                  )]
    

    Intermediates:

       A  0   1    2    3  ge(0).all(axis=1)  le(500).all(axis=1)    AND
    0  a -1  10  100   10              False                 True  False
    1  b  2  20  200   20               True                 True   True
    2  c  3  30  300  600               True                False  False