pythonpandaspandas-resample

Pandas Resample with Linear Interpolation


I have some hourly data, such as below, with odd sample times.

# Date Time, GMT-08:00 Temp, °C
1 10/31/23 15:51 13.41
2 10/31/23 16:51 7.49
3 10/31/23 17:51 7.61
4 10/31/23 18:51 7.39
5 10/31/23 19:51 7.34
6 10/31/23 20:51 7.33
7 10/31/23 21:51 7.38

I would like to resample with interpolation so the data points occur on the hour. I.e. 1500, 1600, 1700...

I assumed the following would work, but I've been unable to make this do what I expected.

df.resample('60min').first().interpolate('linear')


Solution

  • IIUC:

    import pandas as pd
    
    # df
    data = {'Date Time, GMT-08:00': {1: '10/31/23 15:51', 2: '10/31/23 16:51', 
                                     3: '10/31/23 17:51', 4: '10/31/23 18:51', 
                                     5: '10/31/23 19:51', 6: '10/31/23 20:51', 
                                     7: '10/31/23 21:51'}, 
            'Temp, °C': {1: 13.41, 2: 7.49, 3: 7.61, 4: 7.39, 5: 7.34, 6: 7.33, 7: 7.38}
            }
    df = pd.DataFrame(data)
    
    # preliminary steps: convert to timestamps & set as index
    df['Date Time, GMT-08:00'] = pd.to_datetime(df['Date Time, GMT-08:00'], 
                                                format='%m/%d/%y %H:%M')
    
    df = df.set_index('Date Time, GMT-08:00')
    
    # resample to 1min, get first, interpolate, resample to 1h, and get `asfreq`
    df = df.resample('T').first().interpolate().resample('H').asfreq()
    df
    
                          Temp, °C
    Date Time, GMT-08:00          
    2023-10-31 15:00:00        NaN
    2023-10-31 16:00:00    12.5220
    2023-10-31 17:00:00     7.5080
    2023-10-31 18:00:00     7.5770
    2023-10-31 19:00:00     7.3825
    2023-10-31 20:00:00     7.3385
    2023-10-31 21:00:00     7.3375