I have a timeseries like the following:
mode timestamp
0 A 2021-06-29 00:00:00-04:00
1 A 2021-06-29 01:00:00-04:00
2 A 2021-06-29 02:00:00-04:00
3 A 2021-06-29 03:00:00-04:00
4 B 2021-06-29 04:00:00-04:00
5 B 2021-06-29 05:00:00-04:00
6 B 2021-06-29 06:00:00-04:00
7 B 2021-06-29 07:00:00-04:00
8 B 2021-06-29 08:00:00-04:00
9 B 2021-06-29 09:00:00-04:00
10 B 2021-06-29 10:00:00-04:00
11 A 2021-06-29 11:00:00-04:00
12 A 2021-06-29 12:00:00-04:00
13 A 2021-06-29 13:00:00-04:00
14 A 2021-06-29 14:00:00-04:00
15 A 2021-06-29 15:00:00-04:00
16 A 2021-06-29 16:00:00-04:00
17 A 2021-06-29 17:00:00-04:00
18 A 2021-06-29 18:00:00-04:00
19 A 2021-06-29 19:00:00-04:00
20 C 2021-06-29 20:00:00-04:00
21 C 2021-06-29 21:00:00-04:00
22 C 2021-06-29 22:00:00-04:00
23 C 2021-06-29 23:00:00-04:00
24 C 2021-06-29 00:00:00-04:00
25 C 2021-06-29 01:00:00-04:00
26 C 2021-06-29 02:00:00-04:00
27 C 2021-06-29 03:00:00-04:00
I am trying to generate a list of dicts with intervals of modes with start and end time. Something like this
[
{"mode": "A", start_time: "2021-06-29 00:00:00-04:00", end_time:"2021-06-29 03:00:00-04:00" },
{"mode": "B", start_time: "2021-06-29 04:00:00-04:00", end_time:"2021-06-29 10:00:00-04:00" },
{"mode": "A", start_time: "2021-06-29 11:00:00-04:00", end_time:"2021-06-29 19:00:00-04:00" },
{"mode": "C", start_time: "2021-06-29 20:00:00-04:00", end_time:"2021-06-29 03:00:00-04:00" },
]
The timeseries is very long and looping over them appears to be very slow. Any advice on how to achieve this?
You could use a custom groupby.agg
and to_dict
:
# group successive values
group = df['mode'].ne(df['mode'].shift()).cumsum()
# aggregate and convert to dictionary
out = (df.groupby(group)
.agg(**{'mode': ('mode', 'first'),
'start_time': ('timestamp', 'first'),
'end_time': ('timestamp', 'last')})
.to_dict('records')
)
NB. depending on the desired logic, use min
/max
in place of first
/last
for the timestamps.
Output:
[{'mode': 'A', 'start_time': '2021-06-29 00:00:00-04:00', 'end_time': '2021-06-29 03:00:00-04:00'},
{'mode': 'B', 'start_time': '2021-06-29 04:00:00-04:00', 'end_time': '2021-06-29 10:00:00-04:00'},
{'mode': 'A', 'start_time': '2021-06-29 11:00:00-04:00', 'end_time': '2021-06-29 19:00:00-04:00'},
{'mode': 'C', 'start_time': '2021-06-29 20:00:00-04:00', 'end_time': '2021-06-29 03:00:00-04:00'}]