Here is my code set up:
import pandas as pd
df = {'Datetime': ['2020-12-01 00:00:00', '2020-12-01 01:00:00','2020-12-01 02:00:00',
'2020-12-01 03:00:00', '2020-12-01 04:00:00' , '2020-12-01 05:00:00' ,
'2020-12-01 06:00:00' , '2020-12-01 09:00:00' , '2020-12-01 12:00:00' ,
'2020-12-01 18:00:00' , '2020-12-02 00:00:00'
]
}
df = pd.DataFrame(df , columns = ['Datetime'])
df["Datetime"] = pd.to_datetime(df['Datetime'])
df
That produces a dataframe with the following form:
Datetime
0 2020-12-01 00:00:00
1 2020-12-01 01:00:00
2 2020-12-01 02:00:00
3 2020-12-01 03:00:00
4 2020-12-01 04:00:00
5 2020-12-01 05:00:00
6 2020-12-01 06:00:00
7 2020-12-01 09:00:00
8 2020-12-01 12:00:00
9 2020-12-01 18:00:00
10 2020-12-02 00:00:00
What I want to do is find the midpoint between the values and create two new columns in the dataframe. The two new columns are "Start Time" and "End Time". The "Start Time" is the midpoint between that time and the previous time, if one exists. The "End Time" is the midpoint between that time and the next time if one exists. If one does not exist the current time is used.
Here is what I want the code to produce:
Datetime Start Time End Time
0 2020-12-01 00:00:00 2020-12-01 00:00:00 2020-12-01 00:30:00
1 2020-12-01 01:00:00 2020-12-01 00:30:00 2020-12-01 01:30:00
2 2020-12-01 02:00:00 2020-12-01 01:30:00 2020-12-01 02:30:00
3 2020-12-01 03:00:00 2020-12-01 02:30:00 2020-12-01 03:30:00
4 2020-12-01 04:00:00 2020-12-01 03:30:00 2020-12-01 04:30:00
5 2020-12-01 05:00:00 2020-12-01 04:30:00 2020-12-01 05:30:00
6 2020-12-01 06:00:00 2020-12-01 05:30:00 2020-12-01 07:30:00
7 2020-12-01 09:00:00 2020-12-01 07:30:00 2020-12-01 10:30:00
8 2020-12-01 12:00:00 2020-12-01 10:30:00 2020-12-01 15:00:00
9 2020-12-01 18:00:00 2020-12-01 15:00:00 2020-12-01 21:00:00
10 2020-12-02 00:00:00 2020-12-02 21:00:00 2020-12-02 00:00:00
Any help setting this problem up would be greatly appreciated.
You can calculate the midpoint using shift
to get the time difference of consecutive rows and divide by 2 to get Start Time
. Then, just shift(-1)
by one row to get End Time
:
df['Start Time'] = (df['Datetime'] + (df['Datetime'].shift(1) - df['Datetime']) / 2).fillna(df['Datetime'])
df['End Time'] = (df['Start Time'].shift(-1)).fillna(df['Datetime'])
df
Out[1]:
Datetime Start Time End Time
0 2020-12-01 00:00:00 2020-12-01 00:00:00 2020-12-01 00:30:00
1 2020-12-01 01:00:00 2020-12-01 00:30:00 2020-12-01 01:30:00
2 2020-12-01 02:00:00 2020-12-01 01:30:00 2020-12-01 02:30:00
3 2020-12-01 03:00:00 2020-12-01 02:30:00 2020-12-01 03:30:00
4 2020-12-01 04:00:00 2020-12-01 03:30:00 2020-12-01 04:30:00
5 2020-12-01 05:00:00 2020-12-01 04:30:00 2020-12-01 05:30:00
6 2020-12-01 06:00:00 2020-12-01 05:30:00 2020-12-01 07:30:00
7 2020-12-01 09:00:00 2020-12-01 07:30:00 2020-12-01 10:30:00
8 2020-12-01 12:00:00 2020-12-01 10:30:00 2020-12-01 15:00:00
9 2020-12-01 18:00:00 2020-12-01 15:00:00 2020-12-01 21:00:00
10 2020-12-02 00:00:00 2020-12-01 21:00:00 2020-12-02 00:00:00