python-3.xpandasinequality

What is the best way to code a compound inequalities on a PANDAS dataframe?


In the below data frame I'd like to evaluate the following compound inequality:

df['B'] <= df['E'].shift(1) <= df['A']

df

                 A       B       C    D       E       F       G
Date                                                           
2021-05-17  126.93  125.17  125.90  0.0    0.00    0.00  125.90
2021-05-18  126.99  124.78  124.48  0.0    0.00    0.00  124.48
2021-05-19  124.92  122.86  124.32  1.0  122.86  135.15  124.32
2021-05-20  127.72  125.10  126.93  1.0  122.86  135.15  126.93
2021-05-21  128.00  125.21  125.06 -1.0  128.00  115.20  125.06
2021-05-24  127.94  125.94  126.72 -1.0  128.00  115.20  126.72
2021-05-25  128.32  126.32  126.53  0.0    0.00    0.00  128.00
2021-05-26  127.39  126.42  126.48  0.0    0.00    0.00  126.48
2021-05-27  127.64  125.08  124.91  0.0    0.00    0.00  124.91
2021-05-28  125.80  124.55  124.24 -1.0  125.80  113.22  124.24
2021-06-01  125.35  123.94  123.91 -1.0  125.80  113.22  123.91
2021-06-02  125.24  124.05  124.69  1.0  124.05  136.46  124.69
2021-06-03  124.85  123.13  123.18  0.0    0.00    0.00  124.05
2021-06-04  126.16  123.85  125.52  1.0  123.85  136.23  125.52
2021-06-07  126.32  124.83  125.53  1.0  123.85  136.23  125.53
2021-06-08  128.46  126.21  126.37  1.0  123.85  136.23  126.37

Evaluating the above statement produces:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

As a result, I've had to decompose the inequality into a conjunction:

 (df['B'] <= df['E'].shift(1)) & (df['E'].shift(1) <= df['A'])

Unfortunately, this is onerous to read and write since I have many of these statements embedded within another function.

I also considered the following chained statement:

df['B'].le(df['E'].shift(1).le(df['A'])) 

but that won't work because the outer .le would be evaluating a boolean rather than a number.

Given all of this, what is the best (i.e. concise and computationally efficient) way to write a compound inequality to be evaluated per row across multiple columns of a data frame?


Solution

  • A simple way to do that is to use between:

    >>> df['E'].shift().between(df['B'], df['A'], inclusive='both')
    Date
    2021-05-17    False
    2021-05-18    False
    2021-05-19    False
    2021-05-20    False
    2021-05-21    False
    2021-05-24    False
    2021-05-25    False
    2021-05-26    False
    2021-05-27    False
    2021-05-28    False
    2021-06-01    False
    2021-06-02    False
    2021-06-03     True
    2021-06-04    False
    2021-06-07    False
    2021-06-08    False
    dtype: bool