I have a dataframe
import pandas as pd
df = pd.DataFrame({
'ID': [1, 1, 1, 2, 2, 2],
'value': [100, 120, 130, 200, 190, 210],
'value2': [2100, 2120, 2130, 2200, 2190, 2210],
'state': ['init','mid', 'final', 'init', 'mid', 'final'],
})
I want to add another row, which is the difference of 'final' and 'init' in the state column for each ID.
The result should look like
import pandas as pd
df = pd.DataFrame({
'ID': [1, 1, 1, 2, 2, 2, 1, 2],
'value': [100, 120, 130, 200, 190, 210, 30, 10],
'value2': [2100, 2120, 2130, 2200, 2190, 2210, 100, 10],
'state': ['init','mid', 'final', 'init', 'mid', 'final', 'diff', 'diff'],
})
I have tried the following. But I get AttributeError: 'function' object has no attribute 'groupby' error
df1 = df.copy()
df1 = df[df.state.isin(['final', 'init'])].copy
s = df1.groupby('ID', sort=False).diff().dropna.assign(id=df1['ID'].unique(), state='diff')
df = df.append(s, sort=True).sort_values("ID")
What about something like this? Group by ID
, keep only the value*
cols, take the diff, and reassign ID
and state
:
df
ID | value | value2 | state | |
---|---|---|---|---|
0 | 1 | 100 | 2100 | init |
1 | 1 | 120 | 2120 | mid |
2 | 1 | 130 | 2130 | final |
3 | 2 | 200 | 2200 | init |
4 | 2 | 190 | 2190 | mid |
5 | 2 | 210 | 2210 | final |
diffs = (df[df.state.isin(['init', 'final'])]
.groupby('ID')[['value', 'value2']]
.diff().dropna()
.assign(ID=df.ID.unique(), state='diff')
)
ID | value | value2 | state | |
---|---|---|---|---|
0 | 1 | 30.0 | 30.0 | diff |
1 | 2 | 10.0 | 10.0 | diff |
Then combine back into df
:
df = df.append(diffs)
ID | value | value2 | state | |
---|---|---|---|---|
0 | 1 | 100.0 | 2100.0 | init |
1 | 1 | 120.0 | 2120.0 | mid |
2 | 1 | 130.0 | 2130.0 | final |
3 | 2 | 200.0 | 2200.0 | init |
4 | 2 | 190.0 | 2190.0 | mid |
5 | 2 | 210.0 | 2210.0 | final |
0 | 1 | 30.0 | 30.0 | diff |
1 | 2 | 10.0 | 10.0 | diff |