Consider the following dataframe example:
id date hrz tenor 1 2 3 4
AAA 16/03/2010 2 6m 0.54 0.54 0.78 0.19
AAA 30/03/2010 2 6m 0.05 0.67 0.20 0.03
AAA 13/04/2010 2 6m 0.64 0.32 0.13 0.20
AAA 27/04/2010 2 6m 0.99 0.53 0.38 0.97
AAA 11/05/2010 2 6m 0.46 0.90 0.11 0.14
AAA 25/05/2010 2 6m 0.41 0.06 0.96 0.31
AAA 08/06/2010 2 6m 0.19 0.73 0.58 0.80
AAA 22/06/2010 2 6m 0.40 0.95 0.14 0.56
AAA 06/07/2010 2 6m 0.22 0.74 0.85 0.94
AAA 20/07/2010 2 6m 0.34 0.17 0.03 0.77
AAA 03/08/2010 2 6m 0.13 0.32 0.39 0.95
AAA 16/03/2010 2 1y 0.54 0.54 0.78 0.19
AAA 30/03/2010 2 1y 0.05 0.67 0.20 0.03
AAA 13/04/2010 2 1y 0.64 0.32 0.13 0.20
AAA 27/04/2010 2 1y 0.99 0.53 0.38 0.97
AAA 11/05/2010 2 1y 0.46 0.90 0.11 0.14
AAA 25/05/2010 2 1y 0.41 0.06 0.96 0.31
AAA 08/06/2010 2 1y 0.19 0.73 0.58 0.80
AAA 22/06/2010 2 1y 0.40 0.95 0.14 0.56
AAA 06/07/2010 2 1y 0.22 0.74 0.85 0.94
AAA 20/07/2010 2 1y 0.34 0.17 0.03 0.77
AAA 03/08/2010 2 1y 0.13 0.32 0.39 0.95
How can I grouby
the variables id, hrz
and tenor
and apply the following custom functions across the dates?
def ks_test(x):
return scipy.stats.kstest(np.sort(x), 'uniform')[0]
def cvm_test(x):
n = len(x)
i = np.arange(1, n + 1)
x = np.sort(x)
w2 = (1 / (12 * n)) + np.sum((x - ((2 * i - 1) / (2 * n))) ** 2)
return w2
The desired output is the following dataframe (figure results are just examples):
id hrz tenor test 1 2 3 4
AAA 2 6m ks_test 0.04 0.06 0.02 0.03
AAA 2 6m cvm_test 0.09 0.17 0.03 0.05
AAA 2 1y ks_test 0.04 0.06 0.02 0.03
AAA 2 1y cvm_test 0.09 0.17 0.03 0.05
Use GroupBy.agg
with DataFrame.stack
for reshape last level of MultiIndex in columns:
cols = ['id','hrz', 'tenor']
out = (df.groupby(cols)[df.columns.difference(cols + ['date'], sort=False)]
.agg([ks_test, cvm_test])
.rename_axis([None, 'test'], axis=1)
.stack(future_stack=True)
.reset_index())
print (out)
id hrz tenor test 1 2 3 4
0 AAA 2 1y ks_test 0.278182 0.166364 0.254545 0.224545
1 AAA 2 1y cvm_test 0.220803 0.044730 0.158839 0.118321
2 AAA 2 6m ks_test 0.278182 0.166364 0.254545 0.224545
3 AAA 2 6m cvm_test 0.220803 0.044730 0.158839 0.118321
How it working:
print (df.groupby(cols)[df.columns.difference(cols +['date'], sort=False)]
.agg([ks_test, cvm_test]))
1 2 3 \
ks_test cvm_test ks_test cvm_test ks_test cvm_test
id hrz tenor
AAA 2 1y 0.278182 0.220803 0.166364 0.04473 0.254545 0.158839
6m 0.278182 0.220803 0.166364 0.04473 0.254545 0.158839
4
ks_test cvm_test
id hrz tenor
AAA 2 1y 0.224545 0.118321
6m 0.224545 0.118321