pythonpandasdataframedatetimetimeslots

How to get the timeslots from the start time and end time in the pandas dataframe


I have a pandas dataframe where it has start_time, end_time and booking duration.

Please find below example dataframe

id Start_time End_time Duration
1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 0:30:00
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00
3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 1:00:00
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00

I am trying to get the timeslots from the above dataframe

Expected output:

id Start_time End_time Duration Timeslots
1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 0:30:00 9 - 10
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 9-10
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 10-11
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 11-12
3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 1:00:00 10 - 11
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00 8-9
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00 9-10

What I have tried till now

I can get the slots from start_time and end_time but i am missing the expected output

id Start_time End_time Duration TimeSlot
1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 0:30:00 9-9:30
2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 3:10:00 9-12:10
3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 1:00:00 10-11
4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 1:40:00 8 - 9:40

Can anyone give some hints please


Solution

  • Try:

    def get_slots(row):
        dti = pd.date_range(row['Start_time'].floor('H'), 
                            row['End_time'].ceil('H'), freq='H')
        return [f"{s.hour:02}-{e.hour:02}" for s, e in zip(dti, dti[1:])]
        
    out = df.assign(Timeslots=df.apply(get_slots, axis=1)).explode('Timeslots')
    print(out)
    
    # Output:
       id          Start_time            End_time        Duration Timeslots
    0   1 2013-11-20 09:00:00 2013-11-20 09:30:00 0 days 00:30:00     09-10
    1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     09-10
    1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     10-11
    1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     11-12
    1   2 2013-11-20 09:00:00 2013-11-20 12:10:00 0 days 03:10:00     12-13
    2   3 2013-11-20 10:00:00 2013-11-20 11:00:00 0 days 01:00:00     10-11
    3   4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 01:40:00     08-09
    3   4 2013-11-20 08:00:00 2013-11-20 09:40:00 0 days 01:40:00     09-10
    

    Setup to be reproducible:

    import pandas as pd
    from pandas import Timestamp, Timedelta
    
    data = {
    'id': [1, 2, 3, 4],
    'Start_time': [Timestamp('2013-11-20 09:00:00'), Timestamp('2013-11-20 09:00:00'), 
                   Timestamp('2013-11-20 10:00:00'), Timestamp('2013-11-20 08:00:00')], 
    'End_time': [Timestamp('2013-11-20 09:30:00'), Timestamp('2013-11-20 12:10:00'),
                 Timestamp('2013-11-20 11:00:00'), Timestamp('2013-11-20 09:40:00')],
    'Duration': [Timedelta('0 days 00:30:00'), Timedelta('0 days 03:10:00'),
                 Timedelta('0 days 01:00:00'), Timedelta('0 days 01:40:00')]
    }
    
    df = pd.DataFrame(data)