This is my DataFrame:
import pandas as pd
df = pd.DataFrame(
{
'group': list('xxxxyyy'),
'open': [100, 150, 200, 160, 300, 150, 170],
'close': [105, 150, 200, 160, 350, 150, 170],
'stop': [104, 104, 104, 104, 400, 400, 400]
}
)
Expected output is returning group x
based on the group
column:
group open close stop
0 x 100 105 104
1 x 150 150 104
2 x 200 200 104
3 x 160 160 104
Logic:
I want to check if df.stop.iloc[0]
for each group is between df.open.iloc[0]
and df.close.iloc[0]
. And if it is between these two, I want to return that entire group.
This is my attempt. It works but I think there is a better way to do it. Note that in the if
clause, both conditions are needed to be checked.
def func(df):
s = df.stop.iloc[0]
o = df.open.iloc[0]
c = df.close.iloc[0]
if (o <= s <= c) or (c <= s <= o):
return df
out = df.groupby('group').apply(func).reset_index(drop=True)
You could groupby.first
, then build a mask with isin
:
tmp = df.groupby('group').first()
keep = tmp.index[tmp['stop'].between(tmp['open'], tmp['close'])
|tmp['stop'].between(tmp['close'], tmp['open'])
]
df[df['group'].isin(keep)]
Output:
group open close stop
0 x 100 105 104
1 x 150 150 104
2 x 200 200 104
3 x 160 160 104
Intermediate tmp
:
open close stop o<=s<=c c<=s<=o OR
group
x 100 105 104 True False True
y 300 350 400 False False False
groupby.filter
:def func(df):
s = df.stop.iloc[0]
o = df.open.iloc[0]
c = df.close.iloc[0]
return (o <= s <= c) or (c <= s <= o)
df.groupby('group').filter(func)
g = df.groupby('group')
tmp = g.first()
keep = tmp.index[tmp['stop'].between(tmp['open'], tmp['close'])
|tmp['stop'].between(tmp['close'], tmp['open'])
]
for x in keep:
print(f'group {x}')
print(g.get_group(x))
Output:
group x
group open close stop
0 x 100 105 104
1 x 150 150 104
2 x 200 200 104
3 x 160 160 104
For this particular case, you can even skip the groupby
and replace it with drop_duplicates
:
tmp = df.drop_duplicates('group')
keep = tmp.loc[tmp['stop'].between(tmp['open'], tmp['close'])
|tmp['stop'].between(tmp['close'], tmp['open']),
'group']
out = df[df['group'].isin(keep)]
Tested on ~ 7K rows with groups of 3 rows
# filter
266 ms ± 10.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# groupby.first + isin
1.63 ms ± 24.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# drop_duplicates + isin
1.34 ms ± 34.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)