I have a DataFrame that looks something like this:
C1 | C2 |
---|---|
10 | 10 |
20 | 10 |
30 | 16 |
5 | 23 |
6 | 23 |
8 | 10 |
4 | 10 |
2 | 10 |
I would like to calculate the mean value in column C1 depending on the values in column C2. The mean value is to be calculated over all values in column C1 until the value in column C2 changes again. The result table should look like this:
C1 | C2 |
---|---|
15 | 10 |
30 | 16 |
5.5 | 23 |
4.67 | 10 |
Use GroupBy.mean
by consecutive values created by compared Series.shift
ed values with Series.cumsum
, last remove first level and get original order of columns by DataFrame.reindex
:
out =(df.groupby([df['C2'].ne(df['C2'].shift()).cumsum(),'C2'],sort=False)['C1']
.mean()
.droplevel(0)
.reset_index()
.reindex(df.columns, axis=1))
print (out)
C1 C2
0 15.000000 10
1 30.000000 16
2 5.500000 23
3 4.666667 10
How it working:
print (df.assign(compared=df['C2'].ne(df['C2'].shift()),
cumsum=df['C2'].ne(df['C2'].shift()).cumsum()))
C1 C2 compared cumsum
0 10 10 True 1
1 20 10 False 1
2 30 16 True 2
3 5 23 True 3
4 6 23 False 3
5 8 10 True 4
6 4 10 False 4
7 2 10 False 4
Thank you @ouroboros1 for another easier solution with GroupBy.agg
:
out = (df.groupby(df['C2'].ne(df['C2'].shift()).cumsum(), as_index=False)
.agg({'C1': 'mean', 'C2': 'first'}))
print (out)
C1 C2
0 15.000000 10
1 30.000000 16
2 5.500000 23
3 4.666667 10