pythonpandasdataframemulti-indexcovariance-matrix

Multiply Correlation and Volatility Dataframes with Multi-Index to Get Covariance Matrix


I am multiplying a DataFrame of volatilities (rvm) with a DataFrame of correlations (omega_tilde) to obtain a covariance matrix.

The rvm DataFrame (5790 rows × 10 columns):

                     NoDur  Durbl   Manuf   Enrgy   HiTec   Telcm   Shops   Hlth    Utils   Other
Date          lvl1                              
1972-11-30    NoDur  0.006660       0       0       0       0       0       0       0       0
              Durbl  0      0.00939 0       0       0       0       0       0       0       0
              Manuf  0      0       0.00803 0       0       0       0       0       0       0
              Enrgy  0      0       0       0.00851 0       0       0       0       0       0
              HiTec  0      0       0       0       0.01205 0       0       0       0       0
              Telcm  0      0       0       0       0       0.00799 0       0       0       0
              Shops  0      0       0       0       0       0       0.00795 0       0       0
              Hlth   0      0       0       0       0       0       0       0.00819 0       0
              Utils  0      0       0       0       0       0       0       0       0.00505 0
              Other  0      0       0       0       0       0       0       0       0       0.00892
1972-11-31    NoDur  0.006640       0       0       0       0       0       0       0       0
              Durbl  0      0.00943 0       0       0       0       0       0       0       0
              Manuf  0      0       0.00800 0       0       0       0       0       0       0
              Enrgy  0      0       0       0.00837 0       0       0       0       0       0
              HiTec  0      0       0       0       0.01185 0       0       0       0       0
              Telcm  0      0       0       0       0       0.00792 0       0       0       0
              Shops  0      0       0       0       0       0       0.00794 0       0       0
              Hlth   0      0       0       0       0       0       0       0.00804 0       0
              Utils  0      0       0       0       0       0       0       0       0.00504 0
              Other  0      0       0       0       0       0       0       0       0       0.00889

         

The omega_tilde DataFrame (5790 rows × 10 columns):

                     NoDur   Durbl   Manuf   Enrgy   HiTec   Telcm   Shops   Hlth    Utils   Other
Date        level_1                                     
2021-01-31  NoDur    1.00000 0.62369 0.87367 0.65322 0.74356 0.84011 0.77417 0.80183 0.82833 0.84094
            Durbl    0.62369 1.00000 0.69965 0.57501 0.70125 0.60104 0.68652 0.61333 0.45301 0.70556
            Manuf    0.87367 0.69965 1.00000 0.78599 0.81415 0.84477 0.80932 0.82127 0.74803 0.94673
            Enrgy    0.65322 0.57501 0.78599 1.00000 0.59940 0.67492 0.58058 0.61946 0.57830 0.81593
            HiTec    0.74356 0.70125 0.81415 0.59940 1.00000 0.75436 0.91318 0.84508 0.59302 0.81109
            Telcm    0.84011 0.60104 0.84477 0.67492 0.75436 1.00000 0.77555 0.77342 0.73186 0.85595
            Shops    0.77417 0.68652 0.80932 0.58058 0.91318 0.77555 1.00000 0.81197 0.61574 0.79932
            Hlth     0.80183 0.61333 0.82127 0.61946 0.84508 0.77342 0.81197 1.00000 0.70032 0.80875
            Utils    0.82833 0.45301 0.74803 0.57830 0.59302 0.73186 0.61574 0.70032 1.00000 0.72739
            Other    0.84094 0.70556 0.94673 0.81593 0.81109 0.85595 0.79932 0.80875 0.72739 1.00000
2021-02-28  NoDur    1.00000 0.61544 0.87041 0.64622 0.73941 0.83792 0.77075 0.79993 0.82813 0.83937
            Durbl    0.61544 1.00000 0.69464 0.55865 0.70203 0.59109 0.68265 0.60963 0.44792 0.69685 
            Manuf    0.87041 0.69464 1.00000 0.78243 0.81121 0.84189 0.80395 0.81809 0.74489 0.94605
            Enrgy    0.64622 0.55865 0.78243 1.00000 0.58911 0.67134 0.56925 0.61252 0.56865 0.81365
            HiTec    0.73941 0.70203 0.81121 0.58911 1.00000 0.74904 0.91274 0.84179 0.58973 0.80581
            Telcm    0.83792 0.59109 0.84189 0.67134 0.74904 1.00000 0.77078 0.76844 0.72814 0.85493
            Shops    0.77075 0.68265 0.80395 0.56925 0.91274 0.77078 1.00000 0.80924 0.61446 0.79342
            Hlth     0.79993 0.60963 0.81809 0.61252 0.84179 0.76844 0.80924 1.00000 0.69965 0.80394
            Utils    0.82813 0.44792 0.74489 0.56865 0.58973 0.72814 0.61446 0.69965 1.00000 0.72542
            Other    0.83937 0.69685 0.94605 0.81365 0.80581 0.85493 0.79342 0.80394 0.72542 1.00000

The code I tried:

sigma_tilde = omega_tilde.groupby(level='Date').apply(lambda g: rvm_diag.loc[g.name].dot(g.values@(rvm_diag.loc[g.name])))

The error I obtain:

ValueError: matrices are not aligned

EDIT: I also tried the following:

 reshaped = omega_tilde.values.reshape(omega_tilde.index.levels[0].nunique(), omega_tilde.index.levels[1].nunique(), omega_tilde.shape[-1])
 np.einsum('ijk,ik->ijk', rvm_diag.values, np.einsum('ijk,ik->ij', reshaped, rvm_diag.values))

The error here:

 ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (579,10,10)->(579,10,10) (5790,10)->(5790,newaxis,10) 

The output I would like to have is of the same format as the omega_tilde DataFrame, so a matrix for each day.

Any help is appreciated. Thanks!


Solution

  • The code that gave you the ValueError: matrices are not aligned just needs .values added for the matrix multiplication to work and you can use @ for both multiplication steps so you get a DataFrame back.

    sigma_tilde = (
        omega_tilde
        .groupby(level='Date')
        .apply(lambda g: rvm.loc[g.name].values@(g.values@(rvm.loc[g.name]))
    )
    
    # additional step to change the second level of index
    sigma_tilde.index.set_levels(omega_tilde.columns, 1, inplace=True)
    

    On a smaller example (top-left 3x3 quadrants of your DFs above, but with same values for two months and using the same two months for the two DFs):

    omega_tilde = pd.DataFrame(
        np.array(
            [[1.00000, 0.62369, 0.87367],
             [0.62369, 1.00000, 0.69965],
             [0.87367, 0.69965, 1.00000],
             [1.00000, 0.62369, 0.87367],
             [0.62369, 1.00000, 0.69965],
             [0.87367, 0.69965, 1.00000]]
        ),
        index = pd.MultiIndex.from_arrays(
            [[pd.Timestamp('2021-01-31'), pd.Timestamp('2021-01-31'),
              pd.Timestamp('2021-01-31'), pd.Timestamp("2021-02-28"),
              pd.Timestamp("2021-02-28"), pd.Timestamp("2021-02-28")],
             ['NoDur', 'Durbl', 'Manuf']*2],
             names=['Date', 'level_1']
        ),
        columns = ['NoDur', 'Durbl', 'Manuf']
    )
    
    rvm = pd.DataFrame(
        np.array(
            [[0.00666, 0, 0],
             [0, 0.00939, 0],
             [0, 0, 0.00803],
             [0.00666, 0, 0],
             [0, 0.00939, 0],
             [0, 0, 0.00803]]
        ),
        index = pd.MultiIndex.from_arrays(
            [[pd.Timestamp('2021-01-31'), pd.Timestamp('2021-01-31'),
              pd.Timestamp('2021-01-31'), pd.Timestamp("2021-02-28"),
              pd.Timestamp("2021-02-28"), pd.Timestamp("2021-02-28")],
             ['NoDur', 'Durbl', 'Manuf']*2],
             names=['Date', 'level_1']
        ),
        columns = ['NoDur', 'Durbl', 'Manuf']
    )
    

    the multiplication code will produce:

                 level_1       NoDur       Durbl       Manuf
    2021-01-31     NoDur    0.000044    0.000039    0.000047
                   Durbl    0.000039    0.000088    0.000053
                   Manuf    0.000047    0.000053    0.000064
    2021-02-28     NoDur    0.000044    0.000039    0.000047
                   Durbl    0.000039    0.000088    0.000053
                   Manuf    0.000047    0.000053    0.000064