python-3.xpandasdataframepython-datetimepandas-timeindex

Finding the midpoint between values in a pandas datetime column and making a start and end time period column based on the midpoint


Here is my code set up:

import pandas as pd
    
    df = {'Datetime':  ['2020-12-01 00:00:00', '2020-12-01 01:00:00','2020-12-01 02:00:00',
                        '2020-12-01 03:00:00', '2020-12-01 04:00:00' , '2020-12-01 05:00:00' ,
                        '2020-12-01 06:00:00' , '2020-12-01 09:00:00' , '2020-12-01 12:00:00' ,
                        '2020-12-01 18:00:00' , '2020-12-02 00:00:00'
                       ]
           
            }
    
    df = pd.DataFrame(df , columns = ['Datetime'])
    
    df["Datetime"] = pd.to_datetime(df['Datetime'])
    
    df

That produces a dataframe with the following form:

    Datetime
0   2020-12-01 00:00:00
1   2020-12-01 01:00:00
2   2020-12-01 02:00:00
3   2020-12-01 03:00:00
4   2020-12-01 04:00:00
5   2020-12-01 05:00:00
6   2020-12-01 06:00:00
7   2020-12-01 09:00:00
8   2020-12-01 12:00:00
9   2020-12-01 18:00:00
10  2020-12-02 00:00:00

What I want to do is find the midpoint between the values and create two new columns in the dataframe. The two new columns are "Start Time" and "End Time". The "Start Time" is the midpoint between that time and the previous time, if one exists. The "End Time" is the midpoint between that time and the next time if one exists. If one does not exist the current time is used.

Here is what I want the code to produce:

    Datetime    Start Time  End Time
0   2020-12-01 00:00:00     2020-12-01 00:00:00     2020-12-01 00:30:00
1   2020-12-01 01:00:00     2020-12-01 00:30:00     2020-12-01 01:30:00
2   2020-12-01 02:00:00     2020-12-01 01:30:00     2020-12-01 02:30:00
3   2020-12-01 03:00:00     2020-12-01 02:30:00     2020-12-01 03:30:00
4   2020-12-01 04:00:00     2020-12-01 03:30:00     2020-12-01 04:30:00
5   2020-12-01 05:00:00     2020-12-01 04:30:00     2020-12-01 05:30:00
6   2020-12-01 06:00:00     2020-12-01 05:30:00     2020-12-01 07:30:00
7   2020-12-01 09:00:00     2020-12-01 07:30:00     2020-12-01 10:30:00
8   2020-12-01 12:00:00     2020-12-01 10:30:00     2020-12-01 15:00:00
9   2020-12-01 18:00:00     2020-12-01 15:00:00     2020-12-01 21:00:00
10  2020-12-02 00:00:00     2020-12-02 21:00:00     2020-12-02 00:00:00

Any help setting this problem up would be greatly appreciated.


Solution

  • You can calculate the midpoint using shift to get the time difference of consecutive rows and divide by 2 to get Start Time. Then, just shift(-1) by one row to get End Time:

    df['Start Time'] = (df['Datetime'] + (df['Datetime'].shift(1) - df['Datetime']) / 2).fillna(df['Datetime'])
    df['End Time'] = (df['Start Time'].shift(-1)).fillna(df['Datetime'])
    df
    Out[1]: 
                  Datetime          Start Time            End Time
    0  2020-12-01 00:00:00 2020-12-01 00:00:00 2020-12-01 00:30:00
    1  2020-12-01 01:00:00 2020-12-01 00:30:00 2020-12-01 01:30:00
    2  2020-12-01 02:00:00 2020-12-01 01:30:00 2020-12-01 02:30:00
    3  2020-12-01 03:00:00 2020-12-01 02:30:00 2020-12-01 03:30:00
    4  2020-12-01 04:00:00 2020-12-01 03:30:00 2020-12-01 04:30:00
    5  2020-12-01 05:00:00 2020-12-01 04:30:00 2020-12-01 05:30:00
    6  2020-12-01 06:00:00 2020-12-01 05:30:00 2020-12-01 07:30:00
    7  2020-12-01 09:00:00 2020-12-01 07:30:00 2020-12-01 10:30:00
    8  2020-12-01 12:00:00 2020-12-01 10:30:00 2020-12-01 15:00:00
    9  2020-12-01 18:00:00 2020-12-01 15:00:00 2020-12-01 21:00:00
    10 2020-12-02 00:00:00 2020-12-01 21:00:00 2020-12-02 00:00:00