I have a DataFrame with a column min_latency
, which represents the minimum latency achieved by a predictor. If the predictor failed, there's no value, and therefore it returns min_latency=pd.NaT
.
The dataframe is created from a dict, and if and only if all the rows have a pd.NaT
value, the resulting column will have a datetime64[ns]
dtype. It seems impossible to convert it to timedelta
.
df = pd.DataFrame([{'id': i, 'min_latency': pd.NaT} for i in range(10)])
print(df['min_latency'].dtype) # datetime64[ns]
df['min_latency'].astype('timedelta64[ns]') # TypeError: Cannot cast DatetimeArray to dtype timedelta64[ns]
This problem doesn't happen if there's some timedelta in there:
df = pd.DataFrame([{'id': i, 'min_latency': pd.NaT} for i in range(10)] + [{'id': -1, 'min_latency': dt.timedelta(seconds=3)}])
print(df['min_latency'].dtype) # timedelta64[ns]
Naturally, the best thing would be to adjust the return value, using np.timedelta64
instead of pd.NaT
.
import numpy as np
df = pd.DataFrame([{'id': i, 'min_latency': np.timedelta64('NaT', 'ns')}
for i in range(3)]
)
Output:
df['min_latency']
0 NaT
1 NaT
2 NaT
Name: min_latency, dtype: timedelta64[ns]
If that is not an option, you can check is_datetime64_dtype
. If True
, first use Series.values
to return the column as ndarray and then apply np.ndarray.astype
:
from pandas.api.types import is_datetime64_dtype
df = pd.DataFrame([{'id': i, 'min_latency': pd.NaT}
for i in range(3)]
)
if is_datetime64_dtype(df['min_latency']):
df['min_latency'] = df['min_latency'].values.astype('timedelta64[ns]')
Output:
df['min_latency']
0 NaT
1 NaT
2 NaT
Name: min_latency, dtype: timedelta64[ns]
If you want to rely solely on pandas
, you will first need to change values of df['min_latency']
into values that can be understood as a duration. E.g., using pd.to_timedelta
+ Series.dt.nanosecond
:
if is_datetime64_dtype(df['min_latency']):
df['min_latency'] = pd.to_timedelta(df['min_latency'].dt.nanosecond,
unit='ns')