pythonpandasdataframegraphlabsframe

how to take average of the values per three days in a timeseries data


I have a time-series data that look like this:

datetime    | value
2016-01-01  | 1
2016-01-02  | 14
2016-01-03  | 3
2016-01-04  | 15
2016-01-05  | 5
2016-01-06  | 4
2016-01-07  | 7
2016-01-08  | 15

What I want to obtain is the average per three days (as well as per a week) and keep the last day as the index like this:

datetime    | value
2016-01-03  | 6
2016-01-06  | 8
2016-01-08  | 11

You may notice that the remaining 2 records need to be just averaged. Is there an handy way of doing this either in pandas dataframe or graphlab SFrame? If someone can share a relevant resource, that would be great!

I appreciate any help!


Solution

  • You can use agg by numpy array created by floor division:

    print (np.arange(len(df.index)) // 3)
    [0 0 0 1 1 1 2 2]
    
    df = df.groupby(np.arange(len(df.index)) // 3).agg({'datetime': 'last', 'value': 'mean'})
    print (df)
         datetime  value
    0  2016-01-03      6
    1  2016-01-06      8
    2  2016-01-08     11