pythonpandasdataframeaggregatepivot-table

How to evaluate a formula based on different values of one colume in Python Dataframe


I have a dataframe and I want to evaluate a formula such like result = 2*apple - melon - orange and group by testid.

my df is like below

df = pd.DataFrame({'testid':(1,2,1,2,1,2),'Name':('apple','apple','melon','melon','orange','orange'), 'A': (1,2,10,20,5,5), 'B': (1,5,4,2,3,1)})
testid Name A B
1 apple 1 1
2 apple 2 5
1 melon 10 4
2 melon 20 2
1 orange 5 3
2 orange 5 1

and I want my result as for each testid's column A and B, do apple * 2 - melon - orange. and then save into new column result_A and result_B, group by testid

testid result_A result_B
1 -13 -5
2 -21 7

which function can get this result?

I tried to use df.eval() like

eq = 'df[df.Name==\'apple\',[\'A\',\'B\']] *2 - df[df.Name==\'melon\',[\'A\',\'B\']] - df[df.Name==\'orange\',[\'A\',\'B\']]'
df.eval(eq)

but it tells me

UndefinedVariableError: name 'df' is not defined

i realized i should use pd.eval(engine='python') instead of df.eval() and now the problem is that different subset of df return different index which result them cannot add or minus

df[df.Name=='apple'][['A','B']]*2

this gives me

A B
0 2 2
1 4 10

however

df[df.Name=='melon'][['A','B']]

this gives me

A B
2 10 4
3 20 2

so they can't add or minus together because index not match


Solution

  • Try this (the outside brackets are part of the code, copy them too when you copy this code).

    (
        df
        .pivot('testid', 'Name', ['A', 'B'])# reshape df into a wide multiindex df
        .stack(0)                           # remove multiindex
        .eval('apple*2-melon-orange')       # apply formula (creates a Series)
        .unstack()                          # convert to df
        .add_prefix('result_')              # add prefix to column names
        .reset_index()
    )
    

    enter image description here

    eval cannot take formula such as max(apple,0), is there another way to evaluate formula with max or if else?

    max(apple, 0) can be done using clip() before eval(). Use assign() to modify values corresponding to apple. Note that x inside assign() refers to the dataframe that assign() is called on.

    (
        df
        .pivot('testid', 'Name', ['A', 'B'])
        .stack(0)
        .assign(apple=lambda x: x['apple'].clip(0)) # replace all negative apples to 0s
        .eval('apple*2-melon-orange')
        .unstack()
        .add_prefix('result_')
        .reset_index()
    )
    

    I need to evaluate ( melon if ( apple >1 ) else orange ) * 2 - melon - orange

    This can be handled using where() inside assign() (yet again) to choose values either from orange or melon.

    (
        df
        .pivot('testid', 'Name', ['A', 'B'])
        .stack(0)
        # conditionally choose between melon and orange depending on the value of apple
        .assign(apple=lambda x: x['melon'].where(x['apple']>1, x['orange'])) 
        .eval('apple*2-melon-orange')
        .unstack()
        .add_prefix('result_')
        .reset_index()
    )