rcomplexheatmap

Getting different hierarchical clustering in ComplexHeatmap for the same method of performing clustering


I'm using the following code to get a ComplexHeatmap in R. I tried two different methods, specified in the comments in the code. The only difference between the two is that in the first one, I rely on the ComplexHeatmap to do the clustering, while in the second case, I do the clustering myself, and pass the hclust object to Heatmap. In theory, I should be getting the same clustering for the data, but I'm not. I'm not sure what I'm misunderstanding. I know the heatmaps should not be the same, but I believe the clustering of the samples should be. I've shared the data in this Github post about this. PS: The clustering is similar (but displayed in reversed order), but not the exact same.

lung = read.csv("all_lung.csv")
lung["subtype_grouped_meso"] = lung["subtype"]
lung[lung["subtype"] == "Not.Otherwise.Specified" | lung["subtype"] == 
       "Epithelioid" | lung["subtype"] == "Sarcomatoid" | 
       lung["subtype"] == "Biphasic", 
     "subtype_grouped_meso"] = "meso"
subtype = lung[["subtype_grouped_meso"]]
rownames(lung) = lung[["X"]]
lung = lung[, chr_keep]

subtype_colors <- c(
  "Adeno" = "red",
  "Squamous" = "green",
  "SCLC" = "blue",
  "meso" = "orange"
)

lung = t(lung)
column_ha <- HeatmapAnnotation(subtype = subtype, 
                               counts = log10(colSums(lung)),
                               col = list(subtype = 
                                            subtype_colors))

# Method 1
Heatmap(lung,
        top_annotation = column_ha,
        clustering_distance_columns = "pearson",
        clustering_method_columns = "ward.D",
        cluster_rows = F,
        show_column_names = F,
        show_row_names = F, 
        show_row_dend = F)

# Method 2 (manually do the clustering) 
cor_matrix <- cor(lung, method = "pearson")
cor_distance <- as.dist(1 - cor_matrix)
hc <- hclust(cor_distance, method = "ward.D")
Heatmap(cor_matrix, 
        top_annotation = column_ha,
        name = "correlation",
        cluster_columns = hc,
        cluster_rows = hc,
        show_column_names = F,
        show_row_names = F,
        show_row_dend = F,
        col = colorRamp2(c(-1, 0, 1), c("blue", "white", "red")))

Solution

  • I found a similar question here. The punchline is best described by the reorder.dendogram function from the stats package:

    There are many different orderings of a dendrogram that are consistent with the structure imposed. This function takes a dendrogram and a vector of values and reorders the dendrogram in the order of the supplied vector, maintaining the constraints on the dendrogram.