I am looking for a way to achieve column total_diff as shown in this table.
If you take John for example, total trainging_hours is 250. No hours were recorded in month1 to month4, so total_diff remained 250.
In month5 50 hours was recorded so total_diff for month5 is (250-50) = 200, month6(200-60) = 140 and so on. Same approach for the other names. enter image description here
I tried below code but did not work as expected.
mask = ~df['month'].isin(['John_Total', 'Amber_Total', 'Barron_Total'])
df.loc[mask, 'total_diff'] = df[mask].groupby("name")['training_hours'].cumsum()
df['total_diff'] = df.groupby('name')['total_diff'].fillna(df['training_hours'])
df
Any help on how to achieve this in python ?
diff = df["training_hours"].where(
df["month"].str.contains("Total"), -df["training_hours"]
)
df["total_diff"] = diff.groupby(df["name"]).cumsum()
name month training_hours total_diff
0 John John_Total 250 250
1 John month1 0 250
2 John month2 0 250
3 John month3 0 250
4 John month4 0 250
5 John month5 50 200
6 John month6 60 140
7 John month7 0 140
8 John month8 40 100
9 John month9 70 30
10 John month10 0 30
11 John month11 30 0
12 Amber Amber_Total 300 300
13 Amber month1 60 240
14 Amber month2 70 170
15 Amber month3 0 170
16 Amber month4 50 120
17 Amber month5 20 100
18 Amber month6 40 60
19 Amber month7 0 60
20 Amber month8 50 10
21 Amber month9 10 0
22 Amber month10 0 0
23 Amber month11 0 0
24 Barron Barron_Total 270 270
25 Barron month1 0 270
26 Barron month2 70 200
27 Barron month3 50 150
28 Barron month4 50 100
29 Barron month5 0 100
30 Barron month6 30 70
31 Barron month7 0 70
32 Barron month8 40 30
33 Barron month9 10 20
34 Barron month10 5 15
35 Barron month11 15 0
Example data:
data = {
'name': ['John', 'John', 'John', 'John', 'John', 'John', 'John', 'John', 'John', 'John', 'John', 'John',
'Amber', 'Amber', 'Amber', 'Amber', 'Amber', 'Amber', 'Amber', 'Amber', 'Amber', 'Amber', 'Amber', 'Amber',
'Barron', 'Barron', 'Barron', 'Barron', 'Barron', 'Barron', 'Barron', 'Barron', 'Barron', 'Barron', 'Barron', 'Barron'],
'month': ['John_Total', 'month1', 'month2', 'month3', 'month4', 'month5', 'month6', 'month7', 'month8', 'month9', 'month10', 'month11',
'Amber_Total', 'month1', 'month2', 'month3', 'month4', 'month5', 'month6', 'month7', 'month8', 'month9', 'month10', 'month11',
'Barron_Total', 'month1', 'month2', 'month3', 'month4', 'month5', 'month6', 'month7', 'month8', 'month9', 'month10', 'month11'],
'training_hours': [250, 0, 0, 0, 0, 50, 60, 0, 40, 70, 0, 30,
300, 60, 70, 0, 50, 20, 40, 0, 50, 10, 0, 0,
270, 0, 70, 50, 50, 0, 30, 0, 40, 10, 5, 15]
}
df = pd.DataFrame(data)