I have a collection of user data as follows:
user | start | end |
---|---|---|
John Doe | 2025-03-21 11:30:35 | 2025-03-21 13:05:26 |
... | ... | ... |
Jane Doe | 2023-12-31 01:02:03 | 2024-01-02 03:04:05 |
Each user has a start and end datetime of some activity. I would like to place this temporal range in the index so I can quickly query the dataframe to see which users were active during a certain date/time range like so:
df['2024-01-01:2024-01-31']
Pandas has Period
objects, but these seem to only support a specific year, day, or minute, not an arbitrary start and end datetime. Pandas also has MultiIndex
indices, but these seem to be designed for hierarchical categorical labels, not for time ranges. Any other ideas for how to represent this time range in an index?
Here is your solution:
import pandas as pd
data = {
'user': ['John Doe', 'Jane Doe'],
'start': [pd.Timestamp('2025-03-21 11:30:35'), pd.Timestamp('2023-12-31 01:02:03')],
'end': [pd.Timestamp('2025-03-21 13:05:26'), pd.Timestamp('2024-01-02 03:04:05')],
}
df = pd.DataFrame(data)
interval_index = pd.IntervalIndex.from_arrays(df['start'], df['end'], closed='both')
df.set_index(interval_index, inplace=True)
df.drop(columns=['start', 'end'], inplace=True)
# check user
query_time = pd.Timestamp("2024-01-01 12:00:00")
active_users = df[df.index.contains(query_time)]
print(active_users)
Output:
D:\python>python test.py
user
[2023-12-31 01:02:03, 2024-01-02 03:04:05] Jane Doe