pandasnumpymatplotlibplottime-series

fill_between plot fails on specific index value combo for a pands time-series


I try to do a plot and observe a strange error:

import pandas as pd
import matplotlib.pyplot as plt

idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00', '0 days 18:00:00'],
                        dtype='timedelta64[ns]', freq='6H')
ts1 = pd.Series(np.array([  0., 5439.802205, 4506.0691, 640.734375]), index=idx)
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)

plt.figure()

plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2))
plt.fill_between(ts1.index, ts1, ts2, where=(ts1 <= ts2))

plt.show()

This results in a

DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)

Using some other index works:

# ... as above
ts1 = pd.Series(np.array([  0., 5439.802205, 4506.0691, 640.734375]))
ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]))
# as above ...

So far, so normal troubles. But, if I instead replace the values

# ... as above
ts1 = pd.Series(range(4), index=idx)
ts2 = pd.Series(reversed(range(4)), index=idx)
# as above ...

it works as well! So, neither the original index nor the original values are the single cause of the issue. The combination of both is.

Can someone explain this to me, please?


Solution

  • TL;DR

    Convert the index to seconds with pd.TimedeltaIndex.seconds to get compatible dtypes (integers and floats) and use plt.xticks to correct ticks (seconds) and labels (timedelta strings):

    ... # as above, in Q
    
    idx_s = ts1.index.seconds 
    # or use .astype(np.int64) if you want `ns`, but you don't need it here
    
    plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2))
    plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2))
    
    plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right')
    plt.show()
    

    Output:

    fill between


    The issue lies with the dependency of plt.fill_between on numpy. Here's a proper full traceback:

    Traceback (most recent call last):
    
      Cell In[33], line 11
        plt.fill_between(ts1.index, ts1, ts2, where=(ts1 > ts2))
    
      File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:3315 in fill_between
        return gca().fill_between(
    
      File ~\anaconda3\lib\site-packages\matplotlib\__init__.py:1473 in inner
        return func(
    
      File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5648 in fill_between
        return self._fill_between_x_or_y(
    
      File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:5632 in _fill_between_x_or_y
        pts = np.vstack([np.hstack([ind[where, None], dep1[where, None]]),
    
      File ~\anaconda3\lib\site-packages\numpy\core\shape_base.py:359 in hstack
        return _nx.concatenate(arrs, 1, dtype=dtype, casting=casting)
    
    DTypePromotionError: The DType <class 'numpy.dtypes.TimeDelta64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.TimeDelta64DType'>, <class 'numpy.dtypes.Float64DType'>)
    

    The ultimate issue here is that matplotlib uses numpy to arrive at pts, i.e. the area that needs to be filled. See this code. E.g., as part of that attempt, it tries to do something like this:

    np.hstack([idx.values[:, None], ts1.values[:, None]])
    

    This fails (with our DTypePromotionError) because idx.values has dtype dtype('<m8[ns]'), while ts1.values has dtype dtype('float64'). Those are incompatible, meaning that numpy cannot promote those two dtypes to a common dtype: you cannot have fractional values in timedelta64[ns].

    As mentioned by the error message, the above could work with dtype=object, but as a consequence you would create all sorts of performance and optimization issues, such as loss of vectorization.

    You mentioned that this does with:

    ts1 = pd.Series(range(4), index=idx)
    ts2 = pd.Series(reversed(range(4)), index=idx)
    

    Now, this makes sense in that the dtype of both series is dtype('int64'), which is compatible with dtype('<m8[ns]'):

    np.hstack([idx.values[:, None], ts1.values[:, None]])
    
    array([[             0,              0],
           [21600000000000,              1],
           [43200000000000,              2],
           [64800000000000,              3]], dtype='timedelta64[ns]')
    

    Notice that the operation has converted the integers to 'timedelta64[ns]'. It is treating them as if they are also nanoseconds.

    So, the solution here is not to work with 'timedelta64[ns]', but to convert the values of idx to (nano)seconds, to get integers (compatible with floats). (Using pd.TimedeltaIndex.seconds will get you dtype='int32', which suffices for your example. If you really want nanoseconds, use idx.astype(np.int64).)

    Then use plt.xticks to properly set the ticks and labels as required. Full code:

    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    
    idx = pd.TimedeltaIndex(['0 days 00:00:00', '0 days 06:00:00', '0 days 12:00:00', 
                             '0 days 18:00:00'], dtype='timedelta64[ns]', freq='6h')
    # 'H' is deprecated, use 'h' instead.
    
    ts1 = pd.Series(np.array([  0., 5439.802205, 4506.0691, 640.734375]), index=idx)
    ts2 = pd.Series(np.array([747., 740.4, 717., 740.4]), index=idx)
    
    plt.figure()
    
    idx_s = ts1.index.seconds
    # or use .astype(np.int64) if you want `ns`, but you don't need it here
    
    plt.fill_between(idx_s, ts1, ts2, where=(ts1 > ts2))
    plt.fill_between(idx_s, ts1, ts2, where=(ts1 <= ts2))
    
    plt.xticks(ticks=idx_s, labels=idx, rotation=45, ha='right')
    plt.show()