pandasmulti-indexresample

pandas resample multi-channel data


Not sure if this is a multi-index or group by issue.

Given the data set csv example below:

'device','time','data'
1,2021-07-03 00:00:04,299
1,2021-07-03 00:02:34,300
1,2021-07-03 00:11:09,299
1,2021-07-03 00:13:38,299
1,2021-07-03 00:14:27,300
1,2021-07-03 00:19:25,300
1,2021-07-03 00:20:15,299
1,2021-07-03 00:20:23,300

2,2021-07-03 00:00:53,353
2,2021-07-03 00:07:34,352
2,2021-07-03 00:08:10,353
2,2021-07-03 00:12:27,352
2,2021-07-03 00:14:56,353
2,2021-07-03 00:17:00,352
2,2021-07-03 00:18:10,353
2,2021-07-03 00:19:27,352
2,2021-07-03 00:20:25,353

3,2021-07-03 00:07:44,336
3,2021-07-03 00:21:05,335
3,2021-07-03 00:21:54,336

4,2021-07-03 00:00:38,342
4,2021-07-03 00:02:19,343
4,2021-07-03 00:03:09,342
4,2021-07-03 00:22:46,343

I want to resample each device's data to 5 minute intervals, forward fill and back fill at start, ideally all starting at a specific time, and all devices synchronized to same time stamps (every 5 minutes from 00:00), e.g. from midnight, and running for 24 hours.

I've tried all these iterations from other answers, but not sure I'm even going down the right track:

  1. df = df.set_index('ts').groupby('device').resample('5T').ffill()
  2. df = df.groupby('device').apply(lambda x: x.set_index('ts').value.resample('5T').asfreq())
  3. df = df.set_index('ts').groupby('device').resample('5T').ffill().reset_index('ts')

The data needs to be rendered in a heatmap. I had this working with a previous database using timescale and the time_bucket feature, however now I can't use that DB or extension and need to run on ancient Postgres V9.3

Any help is much appreciated!


Solution

  • If you want to create a heatmap, maybe you should use pivot_table. However you have to use an aggregate function to merge values on the same interval (here the mean)

    piv = df.pivot_table(index='time', columns='device', values='data')
    idx = pd.date_range(piv.index.min().normalize(), periods=288, freq='5T')
    piv = piv.resample('5T').mean().reindex(idx).ffill().bfill()
    

    Output:

    >>> piv
    device                        1           2      3           4
    2021-07-03 00:00:00  299.500000  353.000000  336.0  342.333333
    2021-07-03 00:05:00  299.500000  352.500000  336.0  342.333333
    2021-07-03 00:10:00  299.333333  352.500000  336.0  342.333333
    2021-07-03 00:15:00  300.000000  352.333333  336.0  342.333333
    2021-07-03 00:20:00  299.500000  353.000000  335.5  343.000000
    ...                         ...         ...    ...         ...
    2021-07-03 23:35:00  299.500000  353.000000  335.5  343.000000
    2021-07-03 23:40:00  299.500000  353.000000  335.5  343.000000
    2021-07-03 23:45:00  299.500000  353.000000  335.5  343.000000
    2021-07-03 23:50:00  299.500000  353.000000  335.5  343.000000
    2021-07-03 23:55:00  299.500000  353.000000  335.5  343.000000
    
    [288 rows x 4 columns]
    

    Now you can simply use sns.heatmap(piv) to get the expected figure.