pythonpandasdataframe

Pandas's .gt() and .lt() not working when chained together


I'm playing around with the pipe | and ampersand & operators, as well as the .gt() and .lt() built-in functions to see how they work together.

I'm looking at a column in a DataFrame with values from 0.00 to 1.00.

I can use the >, <, and & operators together and find no problem, same with using .gt(), .lt(), and &. However, if I try to chain .gt().lt() it gives another result.

In my example I'm using .gt(0.7).lt(0.9), but this yields values <=0.7. If I change the order to .lt(0.9).gt(0.7), I get values <=0.9.

I can always just write it like this df['column'].gt(0.7)&df['column'].lt(0.9), just wondering if there's a way of chaining .gt().lt()


Solution

  • The misunderstanding is that in Python True == 1 and False == 0 (see bool). Suppose we have:

    import pandas as pd
    
    data = {'col': [0.5, 0.8, 1]}
    df = pd.DataFrame(data)
    
    df['col'].gt(0.7)
    

    When we chain .lt(0.9), this check takes place on the result of .gt(0.7):

    0    False # 0 < 0.9 (True)
    1     True # 1 < 0.9 (False)
    2     True # 1 < 0.9 (False)
    Name: col, dtype: bool
    

    Use Series.between instead, with inclusive to control the comparison operators:

    df['col'].between(0.7, 0.9, inclusive='neither')
    
    0    False # 0.5
    1     True # 0.8
    2    False # 1
    Name: col, dtype: bool