I am currently working on creating heatmaps using the PyComplexHeatmap package in Python. I have a dataframe that I want to use to build heatmaps, and I encounter an error when attempting to plot the heatmap. The error message states:
Starting plotting..
Starting calculating row orders..
Reordering rows...
ValueError: The condensed distance matrix must contain only finite values.
This is my code:
import pandas as pd
import PyComplexHeatmap
data = {
'Geneid': ['K20859', 'K16698', 'K20859', 'K03781', 'K07452', 'K19147', 'K16698', 'K16698', 'K03781', 'K16698'],
'Diagnosis': ['iRBD', 'iRBD', 'PD', 'PD', 'PD', 'PD', 'Ctrl', 'PD', 'PD', 'PD'],
'G': ['DTU008', 'Methanosphaera', 'Methanomassiliicoccus_A', 'Methanomethylophilus', 'Methanomethylophilus', 'Methanomethylophilus', 'Methanosphaera', 'Methanobrevibacter_A', 'Methanomassiliicoccus_A', 'Methanosphaera'],
'tpm': [0.384566, 0.614127, 1.264605, 1.361017, 1.536711, 1.727445, 2.444317, 2.745661, 3.101456, 3.288112]
}
df_G_level = pd.DataFrame(data)
pivot_tables = {}
diagnosis_values = df_G_level['Diagnosis'].unique()
for diagnosis in diagnosis_values:
filtered_df = df_G_level[df_G_level['Diagnosis'] == diagnosis]
pivot_table = filtered_df.pivot_table(index='Geneid', columns='G', values='tpm', aggfunc='sum', fill_value=1e-6)
pivot_table = pivot_table.reindex(index=df_G_level['Geneid'].unique(), columns=df_G_level['G'].unique(), fill_value=a)
pivot_tables[diagnosis] = pivot_table
df_Ctrl = pivot_tables['Ctrl']
row_ha = HeatmapAnnotation(selected=anno_label(df_Ctrl.index.to_frame(), colors='black'), axis=0, verbose=0, orientation='right')
cm1 = ClusterMapPlotter(data=df_Ctrl, left_annotation=None, show_rownames=True, show_colnames=True, row_dendrogram=False, col_dendrogram=False, cmap='Purples', rasterized=True, row_split_gap=0.1, center=0.5, plot=True, label='tpm')
I have already ensured that there are no infinite values in the matrix because I substitute non-existing values with 1e-6 during the pivot table creation. However, I am still encountering the mentioned error. Could you please help me identify the problem and provide a possible solution?
If you set col_cluster=False, row_cluster=False
, you could run this code successfully.
You got an error because there are many columns having the same values, so you can not calculate the linkage.