My code:
xdf = pd.DataFrame(data={'A':[-10,np.nan,-2.2],'B':[np.nan,2,1.5],'C':[3,1,-0.3]},index=['2023-05-13 08:40:00','2023-05-13 08:41:00','2023-05-13 08:42:00'])
xdf =
A B C
2023-05-13 08:40:00 -10.0 NaN 3.0
2023-05-13 08:41:00 NaN 2.0 1.0
2023-05-13 08:42:00 -2.2 1.5 -0.3
Consider only values below 4.0 and above -4.0 in each row of the dataframe
print(xdf[((xdf<4.0).all(axis=1))&((xdf>-4.0).all(axis=1))])
Present output:
A B C
2023-05-13 08:42:00 -2.2 1.5 -0.3
Expected output: My above code drops a row if there is a NaN in one column, despite other columns satisfying the filter condition. So, I want to omit NaN columns and consider non-NaN columns in <> operation.
A B C
2023-05-13 08:41:00 NaN 2.0 1.0
2023-05-13 08:42:00 -2.2 1.5 -0.3
Edit:
One working solution:
print(xdf[((xdf.fillna(True)<4.0).all(axis=1))&((xdf.fillna(True)>-4.0).all(axis=1))])
I suggest add new mask DataFrame.isna
chained with |
(bitwise OR
) for test missing values:
print(xdf[((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1)])
A B C
2023-05-13 08:41:00 NaN 2.0 1.0
2023-05-13 08:42:00 -2.2 1.5 -0.3
How it working:
print ((xdf<4.0) & (xdf>-4.0))
A B C
2023-05-13 08:40:00 False False True
2023-05-13 08:41:00 False True True
2023-05-13 08:42:00 True True True
print (((xdf<4.0) & (xdf>-4.0) | xdf.isna()))
A B C
2023-05-13 08:40:00 False True True
2023-05-13 08:41:00 True True True
2023-05-13 08:42:00 True True True
print(((xdf<4.0) & (xdf>-4.0) | xdf.isna()).all(axis=1))
2023-05-13 08:40:00 False
2023-05-13 08:41:00 True
2023-05-13 08:42:00 True
dtype: bool
Another idea is use DataFrame.lt
and
DataFrame.gt
:
print(xdf[(xdf.lt(4.0) & xdf.gt(-4.0) | xdf.isna()).all(axis=1)])
A B C
2023-05-13 08:41:00 NaN 2.0 1.0
2023-05-13 08:42:00 -2.2 1.5 -0.3