DataFrame with all NaT should be timedelta and not datetime

I have a DataFrame with a column min_latency, which represents the minimum latency achieved by a predictor. If the predictor failed, there's no value, and therefore it returns min_latency=pd.NaT.

The dataframe is created from a dict, and if and only if all the rows have a pd.NaT value, the resulting column will have a datetime64[ns] dtype. It seems impossible to convert it to timedelta.

df = pd.DataFrame([{'id': i, 'min_latency': pd.NaT} for i in range(10)])
print(df['min_latency'].dtype) # datetime64[ns]
df['min_latency'].astype('timedelta64[ns]') # TypeError: Cannot cast DatetimeArray to dtype timedelta64[ns]

This problem doesn't happen if there's some timedelta in there:

df = pd.DataFrame([{'id': i, 'min_latency': pd.NaT} for i in range(10)] + [{'id': -1, 'min_latency': dt.timedelta(seconds=3)}])
print(df['min_latency'].dtype) # timedelta64[ns]

Solution

Naturally, the best thing would be to adjust the return value, using np.timedelta64 instead of pd.NaT.

import numpy as np

df = pd.DataFrame([{'id': i, 'min_latency': np.timedelta64('NaT', 'ns')} 
                   for i in range(3)]
                  )

Output:

df['min_latency']

0   NaT
1   NaT
2   NaT
Name: min_latency, dtype: timedelta64[ns]

If that is not an option, you can check is_datetime64_dtype. If True, first use Series.values to return the column as ndarray and then apply np.ndarray.astype:

from pandas.api.types import is_datetime64_dtype

df = pd.DataFrame([{'id': i, 'min_latency': pd.NaT} 
                   for i in range(3)]
                  )

if is_datetime64_dtype(df['min_latency']):
    df['min_latency'] = df['min_latency'].values.astype('timedelta64[ns]')

Output:

df['min_latency']

0   NaT
1   NaT
2   NaT
Name: min_latency, dtype: timedelta64[ns]

If you want to rely solely on pandas, you will first need to change values of df['min_latency'] into values that can be understood as a duration. E.g., using pd.to_timedelta + Series.dt.nanosecond:

if is_datetime64_dtype(df['min_latency']):
    df['min_latency'] = pd.to_timedelta(df['min_latency'].dt.nanosecond, 
                                        unit='ns')