pythonpandas

How to measure new change in data


Suppose you have this dataframe

d = {'date':['2019-08-25', '2019-09-01', '2019-09-08'],
    'data':[31, 31, 31]}
df_sample = pd.DataFrame(data=d)
df_sample.head()

and you want to measure how much new data comes in on average each week. For example, we had 31 new rows on 8/25 and then on 9/1 we got an additional 31 rows so thats like a 100% increase. What I want to know is on average from one week to the next how much new data comes in?

I know there is diff() and pct_change() but since this will just be 0 in these 3 samples I am wondering what would be the better approach here.


Solution

  • If your data is cumulative, you need a cumsum before pct_change:

    df_sample['change'] = df_sample['data'].cumsum().pct_change().mul(100)
    

    Output:

             date  data  change
    0  2019-08-25    31     NaN
    1  2019-09-01    31   100.0
    2  2019-09-08    31    50.0
    

    Intermediate:

             date  data  cumsum  change
    0  2019-08-25    31      31     NaN
    1  2019-09-01    31      62   100.0
    2  2019-09-08    31      93    50.0
    

    week to week

    If you want a week to week change, go with shift:

    df_sample['change'] = df_sample['data'].div(df_sample['data'].shift()).mul(100)
    

    Output:

             date  data  change
    0  2019-08-25    31     NaN
    1  2019-09-01    31   100.0
    2  2019-09-08    31   100.0