pythonpandasgroup-byvectorizationyfinance

How to correctly add columns to the original multi-index DataFrame after groupby()?


There is a DataFrame with option tickers at the zero level, prices (open, close, high, and low) at the first level, and option types at the second level (header structure: 'Ticker', 'open/close/high/low', 'Type'). The task: for each ticker, calculate the average price, then add columns with the average prices (header structure: 'Ticker', 'Average', 'Type') to the original DataFrame. My script executes almost correctly, but it corrupts the column names of the original DataFrame.

Code:

import pandas as pd
import numpy as np

df_f_avg = pd.DataFrame({
    ('a1c9', 'open',  'call'):    [3,      0.50,   3.0], 
    ('a1c9', 'close', 'call'):    [4,      0.35,   3.5], 
    ('a1c9', 'high',  'call'):    [1,      0.20,   1.5],
    ('a1c9', 'low',   'call'):    [2,      0.09,   2.5],
    ('a1p9', 'open',  'put'):     [5,      0.05,   4.0], 
    ('a1p9', 'close', 'put'):     [6,      0.10,   5.5], 
    ('a1p9', 'high',  'put'):     [7,      0.20,   6.0],
    ('a1p9', 'low',   'put'):     [8,      0.31,   7.0]})
df_f_avg.columns.names = ['Ticker', 'Price', 'Type']
df_f_avg

columns_name = df_f_avg.loc[:, pd.IndexSlice[:, 'open']].columns

avg_group_ticker = pd.DataFrame(data=df_f_avg.loc[:, (slice(None), ['open', 'close', 'high', 'low'], slice(None))].T.groupby('Ticker').mean().T)
avg_group_ticker.columns = columns_name
avg_group_ticker.columns._levels[1].values[:] = 'avg'

df_f_avg = pd.concat([df_f_avg, avg_group_ticker], axis=1)
df_f_avg

It turns out like this: …

Ticker                          a1c9                            a1p9    a1c9    a1p9
Price                           avg                             avg     avg     avg
Type    call    call    call    call    put     put     put     put     call    put
0       3.0     4.00    1.0     2.00    5.00    6.0     7.0     8.00    2.500   6.500
1       0.5     0.35    0.2     0.09    0.05    0.1     0.2     0.31    0.285   0.165
2       3.0     3.50    1.5     2.50    4.00    5.5     6.0     7.00    2.625   5.625

But it should look like this: …

Ticker                          a1c9                            a1p9    a1c9    a1p9
Price   open    close   high    low     open    close   high    low     avg     avg
Type    call    call    call    call    put     put     put     put     call    put
0       3.0     4.00    1.0     2.00    5.00    6.0     7.0     8.00    2.500   6.500
1       0.5     0.35    0.2     0.09    0.05    0.1     0.2     0.31    0.285   0.165
2       3.0     3.50    1.5     2.50    4.00    5.5     6.0     7.00    2.625   5.625

Questions:

  1. How can this be fixed (P.S. this is a shortened example, in reality, there will be dozens of tickers)?
  2. Perhaps you know a simpler and more efficient way to solve this task (simpler code, more reliable and universal, faster, less resource-intensive)?

Solution

  • The issue is that you modify the original index since columns_name = df_f_avg.loc[:, pd.IndexSlice[:, 'open']].columns is making a reference to it.

    You must make a deep copy:

    columns_name = df_f_avg.loc[:, pd.IndexSlice[:, 'open']].columns.copy(deep=True)
    

    You could also rename Price and groupby all levels:

    out = df_f_avg.join(df_f_avg.rename(columns=lambda x: 'avg', level='Price')
                                .T.groupby(['Ticker', 'Price', 'Type']).mean().T
                       )
    

    Output:

    Ticker a1c9                   a1p9                    a1c9   a1p9
    Price  open close high   low  open close high   low    avg    avg
    Type   call  call call  call   put   put  put   put   call    put
    0       3.0  4.00  1.0  2.00  5.00   6.0  7.0  8.00  2.500  6.500
    1       0.5  0.35  0.2  0.09  0.05   0.1  0.2  0.31  0.285  0.165
    2       3.0  3.50  1.5  2.50  4.00   5.5  6.0  7.00  2.625  5.625