pandastime-seriesyfinancepandas-timeindex

How to resample intra-day intervals and use .idxmax()?


I am using data from yfinance which returns a pandas Data-Frame.

                            Volume
Datetime                          
2021-09-13 09:30:00-04:00   951104
2021-09-13 09:35:00-04:00   408357
2021-09-13 09:40:00-04:00   498055
2021-09-13 09:45:00-04:00   466363
2021-09-13 09:50:00-04:00   315385
2021-12-06 15:35:00-05:00   200748
2021-12-06 15:40:00-05:00   336136
2021-12-06 15:45:00-05:00   473106
2021-12-06 15:50:00-05:00   705082
2021-12-06 15:55:00-05:00  1249763

There are 5 minute intra-day intervals in the data-frame. I want to resample to daily data and get the idxmax of the maximum volume for that day.

df.resample("B")["Volume"].idxmax()

Returns an error:

ValueError: attempt to get argmax of an empty sequence

I used B(business-days) as the resampling period, so there shouldn't be any empty sequences.

I should say .max() works fine.

Also using .agg as was suggested in another question returns an error:

df["Volume"].resample("B").agg(lambda x : np.nan if x.count() == 0 else x.idxmax()) 

error:

IndexError: index 77 is out of bounds for axis 0 with size 0

Solution

  • You can use groupby as an alternative of resample:

    >>> df.groupby(df.index.normalize())['Volume'].agg(Datetime='idxmax', Volume='max')
    
                          Datetime   Volume
    Datetime                               
    2021-09-13 2021-09-13 09:30:00   951104
    2021-12-06 2021-12-06 15:55:00  1249763