I've a dataframe like this,
Name COST Timestamp
0 c 8 2023-09-20 15:14:46
1 a 8 2023-09-20 15:14:48
2 c 9 2023-09-20 15:14:55
3 b 10 2023-09-20 15:15:00
4 c 4 2023-09-20 15:15:02
5 a 9 2023-09-20 15:15:04
6 b 3 2023-09-20 15:15:12
7 a 3 2023-09-20 15:15:17
8 c 6 2023-09-20 15:15:20
9 c 6 2023-09-20 15:15:29
What I want is to create a new dataframe. It will look for the sequence of a,b,c (order doesn't matter here) and sum_cost is sum of cost of a,b,c and it's time_stamp will be the last time_stamp of a,b,c whichever you get. Incase of multiple entries of a same name say like b,c,c,b,b,a then take the last c then then the last b and a.
An detailed example:
output should be from the given dataframe will be constructed like this, for first one it will take 'a' of index 1, 'c' of index 2 (as this came after 'c' of index 0) and 'b' of index 3, the sum_cost will be 27 for this group and timestamp will be 2023-09-20 00:14:26 which is of 'b' of index 3 as it came last for this group. And next group it will take 'c' of index 4, 'a' of index 5 and 'b' of index 6 and sum_cost will be 16 and timestamp will be 2023-09-20 00:06:51 which is of 'b' of index 6 as it came last of this sequence.
The output will look like this,
sum_cost Timestamp
0 27 2023-09-20 15:15:00
1 16 2023-09-20 15:15:12
Please help me with this. Thank you!
You need to build a custom grouper, for that a loop is required, here using a custom function:
def group_consecutive(s, target):
out = []
i = 0
g = 1
while i < len(s)-len(target)+1:
if target == set(s.iloc[i:i+len(target)]):
out.extend([g]*len(target))
g += 1
i += len(target)
else:
out.append(0)
i += 1
out.extend([0]*(len(target)-1))
return pd.Series(out, index=s.index)
group = group_consecutive(df['Name'], {'a', 'b', 'c'})
out = df[group>0].groupby(group).agg({'COST': 'sum', 'Timestamp': 'last'})
Output:
COST Timestamp
1 27 2023-09-20 15:15:00
2 16 2023-09-20 15:15:12
Intermediate with group:
Name COST Timestamp group
0 c 8 2023-09-20 15:14:46 0
1 a 8 2023-09-20 15:14:48 1
2 c 9 2023-09-20 15:14:55 1
3 b 10 2023-09-20 15:15:00 1
4 c 4 2023-09-20 15:15:02 2
5 a 9 2023-09-20 15:15:04 2
6 b 3 2023-09-20 15:15:12 2
7 a 3 2023-09-20 15:15:17 0
8 c 6 2023-09-20 15:15:20 0
9 c 6 2023-09-20 15:15:29 0