pythonpandasdata-sciencedata-wranglingdata-scrubbing

Pandas calculating time deltas from index


I have a months time series data that I am trying calculate total hours, minutes, seconds in the dataset as well as for a unique Boolean column for when the column is True or a 1. And for some reason I am doing something wrong where the total time calculations don't appear correct. The code (runs) below goes through calculating the time delta between each index time stamp:

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/bbartling/Data/master/hvac_random_fake_data/testdf2_fc5.csv',
             index_col='Date',
             parse_dates=True)

print(df)

df["timedelta_alldata"] = df.index.to_series().diff()
seconds_alldata = df.timedelta_alldata.sum().seconds
print('SECONDS ALL DATA: ',seconds_alldata)

days_alldata = df.timedelta_alldata.sum().days
print('DAYS ALL DATA: ',days_alldata)

hours_alldata = round(seconds_alldata/3600, 2)
print('HOURS ALL DATA: ',hours_alldata)

minutes_alldata = round((seconds_alldata/60) % 60, 2)
total_hours_calc = days_alldata * 24.0 + hours_alldata
print('TOTAL HOURS CALC: ',total_hours_calc)

# fault flag 5 true time delta calc
df["timedelta_fddflag_fc5"] = df.index.to_series(
).diff().where(df["fc5_flag"] == 1)

seconds_fc5_mode = df.timedelta_fddflag_fc5.sum().seconds
print('FALT FLAG TRUE TOTAL SECONDS: ',seconds_fc5_mode)

hours_fc5_mode = round(seconds_fc5_mode/3600, 2)
print('FALT FLAG TRUE TOTAL HOURS: ',hours_fc5_mode)

percent_true_fc5 = round(df.fc5_flag.mean() * 100, 2)
print('PERCENT TIME WHEN FLAG 5 TRUE: ',percent_true_fc5,'%')

percent_false_fc5 = round((100 - percent_true_fc5), 2)
print('PERCENT TIME WHEN FLAG 5 FALSE: ',percent_false_fc5,'%')

returns:

SECONDS ALL DATA:  85500   <--- I think NOT correct
DAYS ALL DATA:  30
HOURS ALL DATA:  23.75   <--- I think NOT correct
TOTAL HOURS CALC:  743.75
FALT FLAG TRUE TOTAL SECONDS:  1800   <--- I think NOT correct
FALT FLAG TRUE TOTAL HOURS:  0.5   <--- I think NOT correct
PERCENT TIME WHEN FLAG 5 TRUE:  74.29 %
PERCENT TIME WHEN FLAG 5 FALSE:  25.71 %

30 days is correct (DAYS ALL DATA: 30) and the percent of time when a Boolean column (fc5_flag) is True or False but the total seconds and hours seems way off...? Would anyone have any tips to write this better?


Solution

  • For a pd.Timedelta object, .seconds and .days do not represent the total duration in the respective time units. It's just the way that the object stores a timespan internally. For example, 25 hours is stored as 1 day + 3600 seconds. This is consistent with Python's built-in datetime.timedelta.

    The correct way to convert it to different time units is to divide the object by another Timedelta object of the desired unit:

    delta = df.index.to_series().diff()
    total_hours = delta.sum() / pd.Timedelta(hours=1)
    hours_fc5_mode = (delta * df["fc5_flag"]).sum() / pd.Timedelta(hours=1)
    
    print(f"{total_hours=}")    # 743.75
    print(f"{hours_fc5_mode=}") # 552.5