pythonpandasdataframe

Find max/min value in a column in a range of data (multiindex) and append to a different column


I have the following dataframe:

import pandas as pd
import csv

lst = [['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
       ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'],
       ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00']
      ]

df = pd.DataFrame(lst, columns = ['Symbol', 'Date', 'open', 'high', 'low', 'close', 'volume', 'hour'])

df['Date'] = pd.to_datetime(df['Date'])
df['hour'] = pd.to_datetime(df['hour'], format='%H:%M:%S')
df.index = pd.MultiIndex.from_arrays([df['Symbol'], df['Date']], names=['Symbol', 'Date'])
df.drop(columns=['Symbol', 'Date'], inplace=True)

When run, it yields this:

                                 open  high   low  close  volume                hour day_high day_low
Symbol               Date                                                                            
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14 1900-01-01 08:30:00                 
                     2025-07-14  2.80  2.80  2.50   2.53      61 1900-01-01 08:31:00                 
                     2025-07-14  2.45  2.45  2.45   2.45       2 1900-01-01 08:32:00                 
                     2025-07-14  2.58  2.80  2.58   2.60      32 1900-01-01 08:33:00                 
                     2025-07-14  2.50  2.50  2.25   2.30       5 1900-01-01 08:34:00                 
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246 1900-01-01 08:30:00                 
                     2025-07-09  0.05  0.10  0.03   0.07     452 1900-01-01 08:31:00                 
                     2025-07-09  0.07  0.10  0.05   0.07     137 1900-01-01 08:32:00                 
                     2025-07-09  0.07  0.07  0.07   0.07       5 1900-01-01 08:33:00                 
                     2025-07-09  0.07  0.07  0.05   0.05     225 1900-01-01 08:34:00                 
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2 1900-01-01 08:30:00                 
                     2025-07-11  7.20  7.20  7.20   7.20       2 1900-01-01 08:31:00                 
                     2025-07-11  6.92  6.92  6.92   6.92      20 1900-01-01 08:32:00                 
                     2025-07-11  6.58  6.58  6.58   6.58       1 1900-01-01 08:34:00                 
                     2025-07-11  6.41  6.41  6.41   6.41       2 1900-01-01 08:35:00  

This is a fragment of a dataframe that has 820,000 lines. These are options contracts trading over approximately two weeks, with 1 minute bars. I have set up a multiindex with Symbol and Date because it just makes sense to me to filter data that way.

What I want to do, is find max value for a contract on a given day, and append it to the column 'day_high' where the time in 'hour' column is equal to 08:30:00.

I've tried the following line of code:

df.loc[('SPXW 250715C06310000', '2025-07-14'), 'day_high'] = df.loc[('SPXW 250715C06310000', '2025-07-14')]['high'].max()

This line of code puts the max value in all rows. I've tried a few different modifiers to stick specifically in the row where hour = 8:30:00, but I've had no luck.

So... here are my questions:

  1. How can I assign the max, or whatever value from a given day to a different column where 'hour'=8:30:00?

  2. Same as question #2, but for all contracts in the dataframe. Will I need to set up a for loop or is there an idiomatic way of doing this with pandas?

TIA


Solution

  • You need assign by mask for 8:30 and MultiIndex:

    #simplier set MutiIndex
    df['Date'] = pd.to_datetime(df['Date'])
    df['hour'] = pd.to_datetime(df['hour'], format='%H:%M:%S')
    
    df = df.set_index(['Symbol', 'Date'])
    

    mask = df['hour'].dt.strftime('%H:%M').eq('08:30') & 
           (df.index == ('SPXW 250715C06310000', pd.Timestamp('2025-07-14')))
    
    df.loc[mask, 'day_high'] = df.loc[('SPXW 250715C06310000', '2025-07-14')]['high'].max()
    

    print (df)
    
                                     open  high   low  close  volume  \
    Symbol               Date                                          
    SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14   
                         2025-07-14  2.80  2.80  2.50   2.53      61   
                         2025-07-14  2.45  2.45  2.45   2.45       2   
                         2025-07-14  2.58  2.80  2.58   2.60      32   
                         2025-07-14  2.50  2.50  2.25   2.30       5   
    SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246   
                         2025-07-09  0.05  0.10  0.03   0.07     452   
                         2025-07-09  0.07  0.10  0.05   0.07     137   
                         2025-07-09  0.07  0.07  0.07   0.07       5   
                         2025-07-09  0.07  0.07  0.05   0.05     225   
    SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2   
                         2025-07-11  7.20  7.20  7.20   7.20       2   
                         2025-07-11  6.92  6.92  6.92   6.92      20   
                         2025-07-11  6.58  6.58  6.58   6.58       1   
                         2025-07-11  6.41  6.41  6.41   6.41       2   
    
                                                   hour  day_high  
    Symbol               Date                                      
    SPXW 250715C06310000 2025-07-14 1900-01-01 08:30:00      2.87  
                         2025-07-14 1900-01-01 08:31:00       NaN  
                         2025-07-14 1900-01-01 08:32:00       NaN  
                         2025-07-14 1900-01-01 08:33:00       NaN  
                         2025-07-14 1900-01-01 08:34:00       NaN  
    SPXW 250709C06345000 2025-07-09 1900-01-01 08:30:00       NaN  
                         2025-07-09 1900-01-01 08:31:00       NaN  
                         2025-07-09 1900-01-01 08:32:00       NaN  
                         2025-07-09 1900-01-01 08:33:00       NaN  
                         2025-07-09 1900-01-01 08:34:00       NaN  
    SPXW 250715C06310000 2025-07-11 1900-01-01 08:30:00       NaN  
                         2025-07-11 1900-01-01 08:31:00       NaN  
                         2025-07-11 1900-01-01 08:32:00       NaN  
                         2025-07-11 1900-01-01 08:34:00       NaN  
                         2025-07-11 1900-01-01 08:35:00       NaN  
    

    For second use GroupBy.max:

    mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
    df.loc[mask, 'day_high'] = df.groupby(['Symbol','Date'])['high'].max()
    

    Or GroupBy.transform with Series.where for set only rows with 8:30:

    mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
    df['day_high'] = df.groupby(['Symbol','Date'])['high'].transform('max').where(mask)
    

    print (df)
    
                                     open  high   low  close  volume  \
    Symbol               Date                                          
    SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14   
                         2025-07-14  2.80  2.80  2.50   2.53      61   
                         2025-07-14  2.45  2.45  2.45   2.45       2   
                         2025-07-14  2.58  2.80  2.58   2.60      32   
                         2025-07-14  2.50  2.50  2.25   2.30       5   
    SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246   
                         2025-07-09  0.05  0.10  0.03   0.07     452   
                         2025-07-09  0.07  0.10  0.05   0.07     137   
                         2025-07-09  0.07  0.07  0.07   0.07       5   
                         2025-07-09  0.07  0.07  0.05   0.05     225   
    SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2   
                         2025-07-11  7.20  7.20  7.20   7.20       2   
                         2025-07-11  6.92  6.92  6.92   6.92      20   
                         2025-07-11  6.58  6.58  6.58   6.58       1   
                         2025-07-11  6.41  6.41  6.41   6.41       2   
    
                                                   hour  day_high  
    Symbol               Date                                      
    SPXW 250715C06310000 2025-07-14 1900-01-01 08:30:00      2.87  
                         2025-07-14 1900-01-01 08:31:00       NaN  
                         2025-07-14 1900-01-01 08:32:00       NaN  
                         2025-07-14 1900-01-01 08:33:00       NaN  
                         2025-07-14 1900-01-01 08:34:00       NaN  
    SPXW 250709C06345000 2025-07-09 1900-01-01 08:30:00      0.10  
                         2025-07-09 1900-01-01 08:31:00       NaN  
                         2025-07-09 1900-01-01 08:32:00       NaN  
                         2025-07-09 1900-01-01 08:33:00       NaN  
                         2025-07-09 1900-01-01 08:34:00       NaN  
    SPXW 250715C06310000 2025-07-11 1900-01-01 08:30:00      7.30  
                         2025-07-11 1900-01-01 08:31:00       NaN  
                         2025-07-11 1900-01-01 08:32:00       NaN  
                         2025-07-11 1900-01-01 08:34:00       NaN  
                         2025-07-11 1900-01-01 08:35:00       NaN