I have a dataframe and I want to evaluate a formula such like
result = 2*apple - melon - orange
and group by testid.
my df is like below
df = pd.DataFrame({'testid':(1,2,1,2,1,2),'Name':('apple','apple','melon','melon','orange','orange'), 'A': (1,2,10,20,5,5), 'B': (1,5,4,2,3,1)})
testid | Name | A | B |
---|---|---|---|
1 | apple | 1 | 1 |
2 | apple | 2 | 5 |
1 | melon | 10 | 4 |
2 | melon | 20 | 2 |
1 | orange | 5 | 3 |
2 | orange | 5 | 1 |
and I want my result as for each testid's column A and B, do apple * 2 - melon - orange
. and then save into new column result_A and result_B, group by testid
testid | result_A | result_B |
---|---|---|
1 | -13 | -5 |
2 | -21 | 7 |
which function can get this result?
I tried to use df.eval() like
eq = 'df[df.Name==\'apple\',[\'A\',\'B\']] *2 - df[df.Name==\'melon\',[\'A\',\'B\']] - df[df.Name==\'orange\',[\'A\',\'B\']]'
df.eval(eq)
but it tells me
UndefinedVariableError: name 'df' is not defined
i realized i should use pd.eval(engine='python') instead of df.eval() and now the problem is that different subset of df return different index which result them cannot add or minus
df[df.Name=='apple'][['A','B']]*2
this gives me
A | B | |
---|---|---|
0 | 2 | 2 |
1 | 4 | 10 |
however
df[df.Name=='melon'][['A','B']]
this gives me
A | B | |
---|---|---|
2 | 10 | 4 |
3 | 20 | 2 |
so they can't add or minus together because index not match
Try this (the outside brackets are part of the code, copy them too when you copy this code).
(
df
.pivot('testid', 'Name', ['A', 'B'])# reshape df into a wide multiindex df
.stack(0) # remove multiindex
.eval('apple*2-melon-orange') # apply formula (creates a Series)
.unstack() # convert to df
.add_prefix('result_') # add prefix to column names
.reset_index()
)
eval cannot take formula such as
max(apple,0)
, is there another way to evaluate formula with max or if else?
max(apple, 0)
can be done using clip()
before eval()
. Use assign()
to modify values corresponding to apple
. Note that x
inside assign()
refers to the dataframe that assign()
is called on.
(
df
.pivot('testid', 'Name', ['A', 'B'])
.stack(0)
.assign(apple=lambda x: x['apple'].clip(0)) # replace all negative apples to 0s
.eval('apple*2-melon-orange')
.unstack()
.add_prefix('result_')
.reset_index()
)
I need to evaluate
( melon if ( apple >1 ) else orange ) * 2 - melon - orange
This can be handled using where()
inside assign()
(yet again) to choose values either from orange
or melon
.
(
df
.pivot('testid', 'Name', ['A', 'B'])
.stack(0)
# conditionally choose between melon and orange depending on the value of apple
.assign(apple=lambda x: x['melon'].where(x['apple']>1, x['orange']))
.eval('apple*2-melon-orange')
.unstack()
.add_prefix('result_')
.reset_index()
)