pandaspandas-groupbydifference-between-rows

New row which is the difference of two rows using Groupby in a pandas dataframe


I have a dataframe

import pandas as pd
df = pd.DataFrame({ 
'ID': [1, 1, 1, 2, 2, 2], 
'value': [100, 120, 130, 200, 190, 210],
'value2': [2100, 2120, 2130, 2200, 2190, 2210],   
'state': ['init','mid', 'final', 'init', 'mid', 'final'], 
})

I want to add another row, which is the difference of 'final' and 'init' in the state column for each ID.

The result should look like

import pandas as pd
df = pd.DataFrame({ 
    'ID': [1, 1, 1, 2, 2, 2, 1, 2], 
    'value': [100, 120, 130, 200, 190, 210, 30, 10],
    'value2': [2100, 2120, 2130, 2200, 2190, 2210, 100, 10],   
     'state': ['init','mid', 'final', 'init', 'mid', 'final', 'diff', 'diff'], 
 })

I have tried the following. But I get AttributeError: 'function' object has no attribute 'groupby' error

df1 = df.copy()

df1 = df[df.state.isin(['final', 'init'])].copy
s = df1.groupby('ID', sort=False).diff().dropna.assign(id=df1['ID'].unique(), state='diff')
df = df.append(s, sort=True).sort_values("ID")

Solution

  • What about something like this? Group by ID, keep only the value* cols, take the diff, and reassign ID and state:

    df
    
    ID value value2 state
    0 1 100 2100 init
    1 1 120 2120 mid
    2 1 130 2130 final
    3 2 200 2200 init
    4 2 190 2190 mid
    5 2 210 2210 final
    diffs = (df[df.state.isin(['init', 'final'])]
        .groupby('ID')[['value', 'value2']]
        .diff().dropna()
        .assign(ID=df.ID.unique(), state='diff')
    )
    
    ID value value2 state
    0 1 30.0 30.0 diff
    1 2 10.0 10.0 diff

    Then combine back into df:

    df = df.append(diffs)
    
    ID value value2 state
    0 1 100.0 2100.0 init
    1 1 120.0 2120.0 mid
    2 1 130.0 2130.0 final
    3 2 200.0 2200.0 init
    4 2 190.0 2190.0 mid
    5 2 210.0 2210.0 final
    0 1 30.0 30.0 diff
    1 2 10.0 10.0 diff