pythonpandas

Find max value of a column, then find another value in the same row, and copy that value to a new column in Pandas


I have the following frame:

lst = [['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14, '8:30:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61, '8:31:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2, '8:32:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32, '8:33:00'],
       ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5, '8:34:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5, '8:33:00'],
       ['SPXW 250709C06345000', '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
       ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2, '8:30:00'],
       ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2, '8:31:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20, '8:32:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1, '8:34:00'],
       ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2, '8:35:00']
      ]
df = pd.DataFrame(lst, columns = ['Symbol', 'Date', 'open', 'high', 'low', 'close', 'volume', 'hour'])

df['Date'] = pd.to_datetime(df['Date'])
df['hour'] = pd.to_datetime(df['hour'], format='%H:%M:%S')
df.index = pd.MultiIndex.from_arrays([df['Symbol'], df['Date']], names=['Symbol', 'Date'])
df.drop(columns=['Symbol', 'Date'], inplace=True)
df["day_high"] = ""
df["day_low"] = ""
df["day_high_time"] = ""
df["day_low_time"] = ""

mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
df.loc[mask, 'day_high'] = df.groupby(['Symbol','Date'])['high'].max()
df.loc[mask, 'day_low'] = df.groupby(['Symbol','Date'])['low'].min()

When run, it prints this:

                                open  high   low  close  volume                hour day_high day_low day_high_time day_low_time
Symbol               Date                                                                                                       
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14 1900-01-01 08:30:00     2.87    2.25                           
                     2025-07-14  2.80  2.80  2.50   2.53      61 1900-01-01 08:31:00                                            
                     2025-07-14  2.45  2.45  2.45   2.45       2 1900-01-01 08:32:00                                            
                     2025-07-14  2.58  2.80  2.58   2.60      32 1900-01-01 08:33:00                                            
                     2025-07-14  2.50  2.50  2.25   2.30       5 1900-01-01 08:34:00                                            
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246 1900-01-01 08:30:00      0.1    0.03                           
                     2025-07-09  0.05  0.10  0.03   0.07     452 1900-01-01 08:31:00                                            
                     2025-07-09  0.07  0.10  0.05   0.07     137 1900-01-01 08:32:00                                            
                     2025-07-09  0.07  0.07  0.07   0.07       5 1900-01-01 08:33:00                                            
                     2025-07-09  0.07  0.07  0.05   0.05     225 1900-01-01 08:34:00                                            
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2 1900-01-01 08:30:00      7.3    6.41                           
                     2025-07-11  7.20  7.20  7.20   7.20       2 1900-01-01 08:31:00                                            
                     2025-07-11  6.92  6.92  6.92   6.92      20 1900-01-01 08:32:00                                            
                     2025-07-11  6.58  6.58  6.58   6.58       1 1900-01-01 08:34:00                                            
                     2025-07-11  6.41  6.41  6.41   6.41       2 1900-01-01 08:35:00

My previous question was how to find max value, and stick it in the day_high column where hour is equal to 8:30. It was achieved by these lines of code:

mask = df['hour'].dt.strftime('%H:%M').eq('08:30')
df.loc[mask, 'day_high'] = df.groupby(['Symbol','Date'])['high'].max()

Based on that answer, I also populated the 'day_low' column.

What I want to do now, is find the time at which the max or min value occured. I tried the following:

hour_loc = df.groupby(['Symbol','Date'])['high'].max()
df.loc[mask, 'day_high_time'] = df.loc[df['high'] == hour_loc, 'hour']

When I run that, I get 'Can only compare identically-labeled Series objects' error.

In my previous problem about finding max value, another solution was to use pd.groupby.transform method. Using the logic of that answer, I tried the following:

hour_loc_max = df.groupby(['Symbol','Date'])['high'].transform('max').where(mask)
hour_loc_min = df.groupby(['Symbol','Date'])['low'].transform('min').where(mask)
df.loc[mask, 'day_high_time'] = df.loc[df['high'] == hour_loc_max, 'hour']
df.loc[mask, 'day_low_time'] = df.loc[df['low'] == hour_loc_min, 'hour']

When run, it prints this:

                                open  high   low  close  volume                hour day_high day_low        day_high_time day_low_time
Symbol               Date                                                                                                              
SPXW 250715C06310000 2025-07-14  2.74  2.87  2.60   2.65      14 1900-01-01 08:30:00     2.87    2.25  1900-01-01 08:30:00          NaT
                     2025-07-14  2.80  2.80  2.50   2.53      61 1900-01-01 08:31:00                                                   
                     2025-07-14  2.45  2.45  2.45   2.45       2 1900-01-01 08:32:00                                                   
                     2025-07-14  2.58  2.80  2.58   2.60      32 1900-01-01 08:33:00                                                   
                     2025-07-14  2.50  2.50  2.25   2.30       5 1900-01-01 08:34:00                                                   
SPXW 250709C06345000 2025-07-09  0.05  0.05  0.03   0.03     246 1900-01-01 08:30:00      0.1    0.03                  NaT          NaT
                     2025-07-09  0.05  0.10  0.03   0.07     452 1900-01-01 08:31:00                                                   
                     2025-07-09  0.07  0.10  0.05   0.07     137 1900-01-01 08:32:00                                                   
                     2025-07-09  0.07  0.07  0.07   0.07       5 1900-01-01 08:33:00                                                   
                     2025-07-09  0.07  0.07  0.05   0.05     225 1900-01-01 08:34:00                                                   
SPXW 250715C06310000 2025-07-11  7.30  7.30  7.30   7.30       2 1900-01-01 08:30:00      7.3    6.41  1900-01-01 08:30:00          NaT
                     2025-07-11  7.20  7.20  7.20   7.20       2 1900-01-01 08:31:00                                                   
                     2025-07-11  6.92  6.92  6.92   6.92      20 1900-01-01 08:32:00                                                   
                     2025-07-11  6.58  6.58  6.58   6.58       1 1900-01-01 08:34:00                                                   
                     2025-07-11  6.41  6.41  6.41   6.41       2 1900-01-01 08:35:00      

Here are my questions:

  1. Why does the first method not work? It worked when I wanted to copy max value from high/low columns to day_high/day_low columns at 8:30.

  2. What are NaT? All of the time values should be datetime objects.


Solution

  • My previous question was how to find max value, and stick it in the day_high column where hour is equal to 8:30.

    Don't do that if you don't have to. It's square-peg-round-hole; what you are currently doing is a violation of normal form. Make a separate day summary frame.

    This will be efficient enough:

    import pandas as pd
    
    df = pd.DataFrame(
        columns=['symbol', 'date', 'open', 'high', 'low', 'close', 'volume', 'time'],
        data=[
            ['SPXW 250715C06310000', '7/14/2025', 2.74, 2.87, 2.60, 2.65, 14,  '8:30:00'],
            ['SPXW 250715C06310000', '7/14/2025', 2.80, 2.80, 2.50, 2.53, 61,  '8:31:00'],
            ['SPXW 250715C06310000', '7/14/2025', 2.45, 2.45, 2.45, 2.45, 2,   '8:32:00'],
            ['SPXW 250715C06310000', '7/14/2025', 2.58, 2.80, 2.58, 2.60, 32,  '8:33:00'],
            ['SPXW 250715C06310000', '7/14/2025', 2.50, 2.50, 2.25, 2.30, 5,   '8:34:00'],
            ['SPXW 250709C06345000',  '7/9/2025', 0.05, 0.05, 0.03, 0.03, 246, '8:30:00'],
            ['SPXW 250709C06345000',  '7/9/2025', 0.05, 0.10, 0.03, 0.07, 452, '8:31:00'],
            ['SPXW 250709C06345000',  '7/9/2025', 0.07, 0.10, 0.05, 0.07, 137, '8:32:00'],
            ['SPXW 250709C06345000',  '7/9/2025', 0.07, 0.07, 0.07, 0.07, 5,   '8:33:00'],
            ['SPXW 250709C06345000',  '7/9/2025', 0.07, 0.07, 0.05, 0.05, 225, '8:34:00'],
            ['SPXW 250715C06310000', '7/11/2025', 7.30, 7.30, 7.30, 7.30, 2,   '8:30:00'],
            ['SPXW 250715C06310000', '7/11/2025', 7.20, 7.20, 7.20, 7.20, 2,   '8:31:00'],
            ['SPXW 250715C06310000', '7/11/2025', 6.92, 6.92, 6.92, 6.92, 20,  '8:32:00'],
            ['SPXW 250715C06310000', '7/11/2025', 6.58, 6.58, 6.58, 6.58, 1,   '8:34:00'],
            ['SPXW 250715C06310000', '7/11/2025', 6.41, 6.41, 6.41, 6.41, 2,   '8:35:00'],
        ],
    )
    df['datetime'] = pd.to_datetime(df['date'] + ' ' + df['time'])
    
    summary = df.groupby(
        by=['symbol', pd.Grouper(key='datetime', freq='D')],
    ).agg({'low': ['min', 'idxmin'], 'high': ['max', 'idxmax']})
    summary.columns = 'low', 'low_idx', 'high', 'high_idx'
    
    summary['low_time'] = df['datetime'].iloc[summary['low_idx']].values
    summary['high_time'] = df['datetime'].iloc[summary['high_idx']].values
    summary.drop(columns=['low_idx', 'high_idx'], inplace=True)
    print(summary.T)
    
    symbol    SPXW 250709C06345000 SPXW 250715C06310000                     
    datetime            2025-07-09           2025-07-11           2025-07-14
    low                       0.03                 6.41                 2.25
    high                       0.1                  7.3                 2.87
    low_time   2025-07-09 08:30:00  2025-07-11 08:35:00  2025-07-14 08:34:00
    high_time  2025-07-09 08:31:00  2025-07-11 08:30:00  2025-07-14 08:30:00
    

    If you really need it back in a (bad) form that spreads the day summary across the original data, then you can reindex.

    There is another style where you only aggregate using idx[min|max], which might theoretically speed the thing up if you have massive data:

    idx = df.groupby(
        by=['symbol', pd.Grouper(key='datetime', freq='D')],
    ).agg({'low': 'idxmin', 'high': 'idxmax'})
    
    summary = pd.DataFrame(index=idx.index)
    summary[['low', 'low_time']] = df[['low', 'datetime']].iloc[idx['low']].values
    summary[['high', 'high_time']] = df[['high', 'datetime']].iloc[idx['high']].values
    print(summary.T)