I am multiplying a DataFrame of volatilities (rvm) with a DataFrame of correlations (omega_tilde) to obtain a covariance matrix.
The rvm DataFrame (5790 rows × 10 columns):
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
Date lvl1
1972-11-30 NoDur 0.006660 0 0 0 0 0 0 0 0
Durbl 0 0.00939 0 0 0 0 0 0 0 0
Manuf 0 0 0.00803 0 0 0 0 0 0 0
Enrgy 0 0 0 0.00851 0 0 0 0 0 0
HiTec 0 0 0 0 0.01205 0 0 0 0 0
Telcm 0 0 0 0 0 0.00799 0 0 0 0
Shops 0 0 0 0 0 0 0.00795 0 0 0
Hlth 0 0 0 0 0 0 0 0.00819 0 0
Utils 0 0 0 0 0 0 0 0 0.00505 0
Other 0 0 0 0 0 0 0 0 0 0.00892
1972-11-31 NoDur 0.006640 0 0 0 0 0 0 0 0
Durbl 0 0.00943 0 0 0 0 0 0 0 0
Manuf 0 0 0.00800 0 0 0 0 0 0 0
Enrgy 0 0 0 0.00837 0 0 0 0 0 0
HiTec 0 0 0 0 0.01185 0 0 0 0 0
Telcm 0 0 0 0 0 0.00792 0 0 0 0
Shops 0 0 0 0 0 0 0.00794 0 0 0
Hlth 0 0 0 0 0 0 0 0.00804 0 0
Utils 0 0 0 0 0 0 0 0 0.00504 0
Other 0 0 0 0 0 0 0 0 0 0.00889
The omega_tilde DataFrame (5790 rows × 10 columns):
NoDur Durbl Manuf Enrgy HiTec Telcm Shops Hlth Utils Other
Date level_1
2021-01-31 NoDur 1.00000 0.62369 0.87367 0.65322 0.74356 0.84011 0.77417 0.80183 0.82833 0.84094
Durbl 0.62369 1.00000 0.69965 0.57501 0.70125 0.60104 0.68652 0.61333 0.45301 0.70556
Manuf 0.87367 0.69965 1.00000 0.78599 0.81415 0.84477 0.80932 0.82127 0.74803 0.94673
Enrgy 0.65322 0.57501 0.78599 1.00000 0.59940 0.67492 0.58058 0.61946 0.57830 0.81593
HiTec 0.74356 0.70125 0.81415 0.59940 1.00000 0.75436 0.91318 0.84508 0.59302 0.81109
Telcm 0.84011 0.60104 0.84477 0.67492 0.75436 1.00000 0.77555 0.77342 0.73186 0.85595
Shops 0.77417 0.68652 0.80932 0.58058 0.91318 0.77555 1.00000 0.81197 0.61574 0.79932
Hlth 0.80183 0.61333 0.82127 0.61946 0.84508 0.77342 0.81197 1.00000 0.70032 0.80875
Utils 0.82833 0.45301 0.74803 0.57830 0.59302 0.73186 0.61574 0.70032 1.00000 0.72739
Other 0.84094 0.70556 0.94673 0.81593 0.81109 0.85595 0.79932 0.80875 0.72739 1.00000
2021-02-28 NoDur 1.00000 0.61544 0.87041 0.64622 0.73941 0.83792 0.77075 0.79993 0.82813 0.83937
Durbl 0.61544 1.00000 0.69464 0.55865 0.70203 0.59109 0.68265 0.60963 0.44792 0.69685
Manuf 0.87041 0.69464 1.00000 0.78243 0.81121 0.84189 0.80395 0.81809 0.74489 0.94605
Enrgy 0.64622 0.55865 0.78243 1.00000 0.58911 0.67134 0.56925 0.61252 0.56865 0.81365
HiTec 0.73941 0.70203 0.81121 0.58911 1.00000 0.74904 0.91274 0.84179 0.58973 0.80581
Telcm 0.83792 0.59109 0.84189 0.67134 0.74904 1.00000 0.77078 0.76844 0.72814 0.85493
Shops 0.77075 0.68265 0.80395 0.56925 0.91274 0.77078 1.00000 0.80924 0.61446 0.79342
Hlth 0.79993 0.60963 0.81809 0.61252 0.84179 0.76844 0.80924 1.00000 0.69965 0.80394
Utils 0.82813 0.44792 0.74489 0.56865 0.58973 0.72814 0.61446 0.69965 1.00000 0.72542
Other 0.83937 0.69685 0.94605 0.81365 0.80581 0.85493 0.79342 0.80394 0.72542 1.00000
The code I tried:
sigma_tilde = omega_tilde.groupby(level='Date').apply(lambda g: rvm_diag.loc[g.name].dot(g.values@(rvm_diag.loc[g.name])))
The error I obtain:
ValueError: matrices are not aligned
EDIT: I also tried the following:
reshaped = omega_tilde.values.reshape(omega_tilde.index.levels[0].nunique(), omega_tilde.index.levels[1].nunique(), omega_tilde.shape[-1])
np.einsum('ijk,ik->ijk', rvm_diag.values, np.einsum('ijk,ik->ij', reshaped, rvm_diag.values))
The error here:
ValueError: operands could not be broadcast together with remapped shapes [original->remapped]: (579,10,10)->(579,10,10) (5790,10)->(5790,newaxis,10)
The output I would like to have is of the same format as the omega_tilde DataFrame, so a matrix for each day.
Any help is appreciated. Thanks!
The code that gave you the ValueError: matrices are not aligned
just needs .values
added for the matrix multiplication to work and you can use @
for both multiplication steps so you get a DataFrame back.
sigma_tilde = (
omega_tilde
.groupby(level='Date')
.apply(lambda g: rvm.loc[g.name].values@(g.values@(rvm.loc[g.name]))
)
# additional step to change the second level of index
sigma_tilde.index.set_levels(omega_tilde.columns, 1, inplace=True)
On a smaller example (top-left 3x3 quadrants of your DFs above, but with same values for two months and using the same two months for the two DFs):
omega_tilde = pd.DataFrame(
np.array(
[[1.00000, 0.62369, 0.87367],
[0.62369, 1.00000, 0.69965],
[0.87367, 0.69965, 1.00000],
[1.00000, 0.62369, 0.87367],
[0.62369, 1.00000, 0.69965],
[0.87367, 0.69965, 1.00000]]
),
index = pd.MultiIndex.from_arrays(
[[pd.Timestamp('2021-01-31'), pd.Timestamp('2021-01-31'),
pd.Timestamp('2021-01-31'), pd.Timestamp("2021-02-28"),
pd.Timestamp("2021-02-28"), pd.Timestamp("2021-02-28")],
['NoDur', 'Durbl', 'Manuf']*2],
names=['Date', 'level_1']
),
columns = ['NoDur', 'Durbl', 'Manuf']
)
rvm = pd.DataFrame(
np.array(
[[0.00666, 0, 0],
[0, 0.00939, 0],
[0, 0, 0.00803],
[0.00666, 0, 0],
[0, 0.00939, 0],
[0, 0, 0.00803]]
),
index = pd.MultiIndex.from_arrays(
[[pd.Timestamp('2021-01-31'), pd.Timestamp('2021-01-31'),
pd.Timestamp('2021-01-31'), pd.Timestamp("2021-02-28"),
pd.Timestamp("2021-02-28"), pd.Timestamp("2021-02-28")],
['NoDur', 'Durbl', 'Manuf']*2],
names=['Date', 'level_1']
),
columns = ['NoDur', 'Durbl', 'Manuf']
)
the multiplication code will produce:
level_1 NoDur Durbl Manuf
2021-01-31 NoDur 0.000044 0.000039 0.000047
Durbl 0.000039 0.000088 0.000053
Manuf 0.000047 0.000053 0.000064
2021-02-28 NoDur 0.000044 0.000039 0.000047
Durbl 0.000039 0.000088 0.000053
Manuf 0.000047 0.000053 0.000064