I try to do a plot and observe a strange error:
import pandas as pd
import matplotlib.pyplot as plt
idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00', '0 days 18:00:00'],
dtype='timedelta64[ns]', freq='6H')
ts1 = pd.Series(np.array([ 0., 5439.802205, 4506.0691, 640.734375]), index=idx)
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)
plt.figure()
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 <= ts2))
plt.show()
This results in a
DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)
Using some other index works:
# ... as above
ts1 = pd.Series(np.array([ 0., 5439.802205, 4506.0691, 640.734375]))
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]))
# as above ...
So far, so normal troubles. But, if I instead replace the values
# ... as above
ts1 = pd.Series(range(4), index=idx)
ts2 = pd.Series(reversed(range(4)), index=idx)
# as above ...
it works as well! So, neither the original index nor the original values are the single cause of the issue. The combination of both is.
Can someone explain this to me, please?
TL;DR
Convert the index to seconds with pd.TimedeltaIndex.seconds
to get compatible dtypes (integers
and floats
) and use plt.xticks
to correct ticks
(seconds) and labels
(timedelta strings):
... # as above, in Q
idx_s = ts1.index.seconds
# or use .astype(np.int64) if you want `ns`, but you don't need it here
plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2))
plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right')
plt.show()
Output:
The issue lies with the dependency of plt.fill_between
on numpy
. Here's a proper full traceback:
Traceback (most recent call last):
Cell In[33], line 11
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2))
File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:3315 in fill_between
return gca().fill_between(
File ~\anaconda3\lib\site-packages\matplotlib\__init__.py:1473 in inner
return func(
File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5648 in fill_between
return self._fill_between_x_or_y(
File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5632 in _fill_between_x_or_y
pts = np.vstack([np.hstack([ind[where, None], dep1[where, None]]),
File ~\anaconda3\lib\site-packages\numpy\core\shape_base.py:359 in hstack
return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)
The ultimate issue here is that matplotlib
uses numpy
to arrive at pts
, i.e. the area that needs to be filled. See this code. E.g., as part of that attempt, it tries to do something like this:
np.hstack([idx.values[:, None], ts1.values[:, None]])
This fails (with our DTypePromotionError
) because idx.values
has dtype dtype('<m8[ns]')
, while ts1.values
has dtype dtype('float64')
. Those are incompatible, meaning that numpy
cannot promote those two dtypes to a common dtype: you cannot have fractional values in timedelta64[ns]
.
As mentioned by the error message, the above could work with dtype=object
, but as a consequence you would create all sorts of performance and optimization issues, such as loss of vectorization.
You mentioned that this does with:
ts1 = pd.Series(range(4), index=idx)
ts2 = pd.Series(reversed(range(4)), index=idx)
Now, this makes sense in that the dtype of both series is dtype('int64')
, which is compatible with dtype('<m8[ns]')
:
np.hstack([idx.values[:, None], ts1.values[:, None]])
array([[ 0, 0],
[21600000000000, 1],
[43200000000000, 2],
[64800000000000, 3]], dtype='timedelta64[ns]')
Notice that the operation has converted the integers to 'timedelta64[ns]'. It is treating them as if they are also nanoseconds.
So, the solution here is not to work with 'timedelta64[ns]', but to convert the values of idx
to (nano)seconds, to get integers
(compatible with floats
). (Using pd.TimedeltaIndex.seconds
will get you dtype='int32'
, which suffices for your example. If you really want nanoseconds, use idx.astype(np.int64)
.)
Then use plt.xticks
to properly set the ticks
and labels
as required. Full code:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00',
'0 days 18:00:00'], dtype='timedelta64[ns]', freq='6h')
# 'H' is deprecated, use 'h' instead.
ts1 = pd.Series(np.array([ 0., 5439.802205, 4506.0691, 640.734375]), index=idx)
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)
plt.figure()
idx_s = ts1.index.seconds
# or use .astype(np.int64) if you want `ns`, but you don't need it here
plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2))
plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right')
plt.show()