pandastime-series

Convert timeseries to intervals in pandas


I have a timeseries like the following:

    mode     timestamp               
0    A    2021-06-29 00:00:00-04:00
1    A    2021-06-29 01:00:00-04:00
2    A    2021-06-29 02:00:00-04:00
3    A    2021-06-29 03:00:00-04:00
4    B    2021-06-29 04:00:00-04:00
5    B    2021-06-29 05:00:00-04:00
6    B    2021-06-29 06:00:00-04:00
7    B    2021-06-29 07:00:00-04:00
8    B    2021-06-29 08:00:00-04:00
9    B    2021-06-29 09:00:00-04:00
10   B    2021-06-29 10:00:00-04:00
11   A    2021-06-29 11:00:00-04:00
12   A    2021-06-29 12:00:00-04:00
13   A    2021-06-29 13:00:00-04:00
14   A    2021-06-29 14:00:00-04:00
15   A    2021-06-29 15:00:00-04:00
16   A    2021-06-29 16:00:00-04:00
17   A    2021-06-29 17:00:00-04:00
18   A    2021-06-29 18:00:00-04:00
19   A    2021-06-29 19:00:00-04:00
20   C    2021-06-29 20:00:00-04:00
21   C    2021-06-29 21:00:00-04:00
22   C    2021-06-29 22:00:00-04:00
23   C    2021-06-29 23:00:00-04:00
24   C    2021-06-29 00:00:00-04:00
25   C    2021-06-29 01:00:00-04:00
26   C    2021-06-29 02:00:00-04:00
27   C    2021-06-29 03:00:00-04:00

I am trying to generate a list of dicts with intervals of modes with start and end time. Something like this

[
{"mode": "A", start_time: "2021-06-29 00:00:00-04:00", end_time:"2021-06-29 03:00:00-04:00" },
{"mode": "B", start_time: "2021-06-29 04:00:00-04:00", end_time:"2021-06-29 10:00:00-04:00" },
{"mode": "A", start_time: "2021-06-29 11:00:00-04:00", end_time:"2021-06-29 19:00:00-04:00" },
{"mode": "C", start_time: "2021-06-29 20:00:00-04:00", end_time:"2021-06-29 03:00:00-04:00" },
]

The timeseries is very long and looping over them appears to be very slow. Any advice on how to achieve this?


Solution

  • You could use a custom groupby.agg and to_dict:

    # group successive values
    group = df['mode'].ne(df['mode'].shift()).cumsum()
    
    # aggregate and convert to dictionary
    out = (df.groupby(group)
             .agg(**{'mode': ('mode', 'first'),
                     'start_time': ('timestamp', 'first'),
                     'end_time': ('timestamp', 'last')})
             .to_dict('records')
          )
    

    NB. depending on the desired logic, use min/max in place of first/last for the timestamps.

    Output:

    [{'mode': 'A', 'start_time': '2021-06-29 00:00:00-04:00', 'end_time': '2021-06-29 03:00:00-04:00'},
     {'mode': 'B', 'start_time': '2021-06-29 04:00:00-04:00', 'end_time': '2021-06-29 10:00:00-04:00'},
     {'mode': 'A', 'start_time': '2021-06-29 11:00:00-04:00', 'end_time': '2021-06-29 19:00:00-04:00'},
     {'mode': 'C', 'start_time': '2021-06-29 20:00:00-04:00', 'end_time': '2021-06-29 03:00:00-04:00'}]