pythonpandasdataframecountstatistics

Count the number of columns that have a value above a threshold


Assuming I have the following toy model, df:

product    customer1    customer2    customer3      
apple           40           110          120
banana         200           150          180
coconut         10             5           25
daq            120            10           30
eclair          45           190           35

I would like to add a column to df that counts the number of customers that bought at least a hundred of the items listed:

product    customer1    customer2    customer3   atleast100    
apple           40           110          120             2
banana         200           150          180             3
coconut         10             5           25             0
daq            120            10           30             1
eclair          45           190           35             1

Solution

  • Among the customer columns, count the number of values greater or equal to 100 in each row using ge().sum().

    df['atleast100'] = df.filter(like='customer').ge(100).sum(axis=1)
    print(df)
    
       product  customer1  customer2  customer3  atleast100
    0    apple         40        110        120           2
    1   banana        200        150        180           3
    2  coconut         10          5         25           0
    3      daq        120         10         30           1
    4   eclair         45        190         35           1