I've got a pandas dataframe that has 100 columns. First col is a string and the rest are ints. I'd like to drop any row where an int column contains a value that is out of range (less than 0 or greater than 500)
I've seen examples of doing this operation on single columns but not multiple. I've tried the following where I've filled a list with all 99 column names that are ints but the index list returned contains the index of every row so they all get dropped.
drop_list = my_data[(my_data[column_name_list] < 0) | (my_data[column_name_list] > 500)].index
my_data = my_data.drop(drop_list)
If I understand correctly, you should aggregate with any
/all
to perform boolean indexing:
out = my_data[~(my_data[column_name_list].lt(0).any(axis=1)
|my_data[column_name_list].gt(500).any(axis=1)
)]
Example:
# input
A 0 1 2 3
0 a -1 10 100 10
1 b 2 20 200 20
2 c 3 30 300 600
# output
A 0 1 2 3
1 b 2 20 200 20
# intermediates
A 0 1 2 3 lt(0).any(axis=1) gt(500).any(axis=1) OR ~
0 a -1 10 100 10 True False True False
1 b 2 20 200 20 False False False True
2 c 3 30 300 600 False True True False
Note that following De Morgan's law this is equivalent to:
out = my_data[(my_data[column_name_list].ge(0).all(axis=1)
& my_data[column_name_list].le(500).all(axis=1)
)]
Intermediates:
A 0 1 2 3 ge(0).all(axis=1) le(500).all(axis=1) AND
0 a -1 10 100 10 False True False
1 b 2 20 200 20 True True True
2 c 3 30 300 600 True False False