pythonpandastimedeltamixed-type

Check if value is equal to 0 for mixed type column (with timedelta and floats)


Let's say we have the following dataframe. Which in real case is an comparison of columns after melting, that's the reason there are mixed types.

df = pd.DataFrame({'value':[0.0, 0.0, pd.Timedelta(hours=1), pd.Timedelta(0)]})

             value
0                0
1                0
2  0 days 01:00:00
3  0 days 00:00:00

What I wanted to do, is to check if this is equal to 0 and based on that make a conditional column.

So first we have a to get a boolean to mark which rows are 0. Simply using eq or ==, wont work:

df['value'].eq(0)

0     True
1     True
2    False
3    False
Name: value, dtype: bool

This is because we have a Timedelta type probably, so I thought lets convert the timedelta to seconds. So I checked with:

df['value'].apply(type) == pd._libs.tslibs.timedeltas.Timedelta

0    False
1    False
2     True
3     True
Name: value, dtype: bool

Which works.

Then used, which did not work:

np.where(df['value'].apply(type) == pd._libs.tslibs.timedeltas.Timedelta, 
         df['value'].total_seconds(), 
         df['value'])

'Series' object has no attribute 'total_seconds'

Finally, this works.

df['value'].apply(lambda x: x.total_seconds() if type(x) == pd._libs.tslibs.timedeltas.Timedelta else x).eq(0)

0     True
1     True
2    False
3     True
Name: value, dtype: bool

But it's quite slow and does not look "panda like".

So my question is, is this there a faster more optimal solution?


Solution

  • I will 'upgrade' the int to timedelta

    pd.to_timedelta(df.value).dt.total_seconds()==0
    Out[232]: 
    0     True
    1     True
    2    False
    3     True
    Name: value, dtype: bool