all, I have a correlation matrix of 21 industry sectors. Now I want to split these 21 sectors into 4 or 5 groups, with sectors of similar behaviors grouped together.
Can experts shed me some lights on how to do this in Python please? Thanks much in advance!
UPDATE: This answer is wrong, and your clustering will not work correctly. Do not use it and read the explanation in Martijn Courteaux's answer below.
You might explore the use of Pandas DataFrame.corr
and the scipy.cluster
Hierarchical Clustering package
import pandas as pd
import scipy.cluster.hierarchy as spc
df = pd.DataFrame(my_data)
corr = df.corr().values
pdist = spc.distance.pdist(corr)
linkage = spc.linkage(pdist, method='complete')
idx = spc.fcluster(linkage, 0.5 * pdist.max(), 'distance')