pythonpandasseries

Why pandas boolean series needs () to work with boolean operators?


I've seen that if you pass a boolean series to a dataframe of the same length as rows in the dataframe it filters the dataframe. However, if we pass a condition instead of a boolean series (like df['col']==value) and want to perform boolean operations on that condition (like ~ ) it does not work, even though the condition's result is a boolean series. It only works if it is surrounded by parenthesis. In other words, this works df[~(df['col']>value)] and this does not df[~df['col']>value], notice the only difference are the parenthesis

I thought the parenthesis was doing something to the boolean series resulting from applying df['col']>value, like casting it into another kind of object that supports operations such as ~. But it does not, the type(df['col']>value) and type((df['col']>value)) is the same, whcih is "pandas.core.series.Series". So what are those parenthesis doing that enables the boolean series resulting from using the condition?

Moreover, if you have two boolean_series derived from applying conditions to a dataframe, like

series_a=df['col']>value and series_b=df['col']==value and you try to use both of them with an & operator this way df[series_a & series_b] it actually works fine. But calculating them inside the dataframe does not works df[df['col']>value & df['col']==value] , it gives error TypeError: unsupported operand type(s) for &: 'int' and 'IntegerArray' From that error I would assume there is some precedence in the operators taking place since it seems it's trying to apply the & to an IntegerArray, probably doing this: df['col']> (value & df['col']) ==value But I would like to ask to confirm

Example: Supposing we have some dataframe with column tag that has either values A or B

import pandas as pd
import numpy as np
import random

df=pd.DataFrame({'tag'=[random.choice['A','B' for i in range(100)]}

If I try to filter doing this:

df[~(df['tag']=='A')]

It works, but If I do this without those parenthesis it does not works with this error TypeError: bad operand type for unary ~: 'str'

df[~df['tag']=='A']

Solution

  • It's a question of Operator precedence. When you provide two operations (~ and >), Python has to decide which one to apply first. In

    ~df['col']>value
    

    ~ has higher precedence so it goes first. You negated the dataframe and then compared. It's the same as (~(df['col'])) > value.

    If you want to compare and then negate, you have to use parentheses to avoid the unwanted order of operations. Expressions inside parens have the highest precedence. In

    ~(df['col']>value)
    

    the comparison is done first.