pythonpandasindexingboolean

Python Pandas difference in boolean indexing between ~ != and ==


I am confused about different results of boolean indexing when using ~ after != versus when using just ==

I have a pandas df with 4 columns:

dic = {
    "a": [1,1,1,0,0,1,1],
    "b": [0,0,1,1,0,0,0],
    "c": [1,0,1,0,0,1,0],
    "d": [0,0,1,0,0,1,0],
}

df = pd.DataFrame(data=dic)

print(df)

   a  b  c  d
0  1  0  1  0
1  1  0  0  0
2  1  1  1  1
3  0  1  0  0
4  0  0  0  0
5  1  0  1  1
6  1  0  0  0

I want to subset the whole df dataframe: I want to remove all rows which have all elements zero, but just on the columns b c d, and not on a.

If I use ~ (not) operator after == I get the desired result:

names = ["b","c","d"]

df_A = df.loc[~(df[names] == 0.0).all(axis=1)]

print(df_A)

   a  b  c  d
0  1  0  1  0
2  1  1  1  1
3  0  1  0  0
5  1  0  1  1

But when I use just == I get different result:

names = ["b","c","d"]

df_B = df.loc[(df[names] != 0.0).all(axis=1)]

print(df_B)

   a  b  c  d
2  1  1  1  1

Do you have any idea why is this the case? Should these two not be the same?

Thank you.


Solution

  • You're not correctly following De Morgan's law.

    If you use the opposite condition as input, you have to replace all (AND) by any (OR):

    df_B = df.loc[(df[names] != 0.0).any(axis=1)]
    

    In English this would be:

    In short the rules for inverting input conditions are:

    Output:

       a  b  c  d
    0  1  0  1  0
    2  1  1  1  1
    3  0  1  0  0
    5  1  0  1  1