I am looking for an efficient way to plot a dendrogram obtained from a data, but alongside the corresponding distance matrix instead of the original data. I have been curious about how different papers to show this and it seems that all they do is to plot the heatmap and the dendrogram separately and process them in an image-editing software. Hopefully the following codes will make clear of what I want. Say I generate the following data and get a hierarchical clustering using Pearson's correlation as the distance measure and complete linkage as the clustering:
library(gplots)
set.seed(2)
x <- matrix(rnorm(100), nrow = 5)
dist.fn <- function(x) as.dist(1-cor(t(x)))
hclust.com <- function(x) hclust(x, method="complete")
h.ori <- heatmap.2(x, trace="none", distfun=dist.fn, hclustfun=hclust.com,dendrogram = "row",main = "Fig1")
h.ori$rowInd
# 1 3 5 4 2
Now I can plot the corresponding distance matrix ordering its rows and columns by the dendrogram in Fig1 as:
colfunc <- colorRampPalette(c("red", "white", "yellow")) #not really necessary
dmat <- cor(t(x))[h.ori$rowInd,h.ori$rowInd]
heatmap.2(dmat,Rowv = NULL,Colv = "Rowv",scale = 'none',
dendrogram='none',trace = 'none',density.info="none",
labRow = h.ori$rowInd, labCol = h.ori$rowInd,
col=colfunc(20))
Here goes my question: How do I add the dendrogram plotted in Fig1 on to the one in Fig2 (preferably along both columns and rows) ? The purpose is to view the clustering as produced by the dendrogram and for Block models this would be a nice way to visualize. Also as a side question, I know how to plot heatmaps using ggplot2 library i.e. using geom_tile(). Is there a way to do the same things I want above using ggplot2 ?
With regards to doing this in ggplot2; I wrote a function at some point that helps with this, though it is not without flaws. It takes an hclust
object and uses that to plot a dendrogram as the axis guide. First we'll grab the dendrogram from the heatmap you had before.
library(gplots)
#> Warning: package 'gplots' was built under R version 4.0.2
#>
#> Attaching package: 'gplots'
#> The following object is masked from 'package:stats':
#>
#> lowess
library(ggplot2)
library(ggh4x)
set.seed(2)
x <- matrix(rnorm(100), nrow = 5)
dist.fn <- function(x) as.dist(1-cor(t(x)))
hclust.com <- function(x) hclust(x, method="complete")
h.ori <- heatmap.2(x, trace="none", distfun=dist.fn, hclustfun=hclust.com,dendrogram = "row",main = "Fig1")
h.ori$rowInd
#> [1] 1 3 5 4 2
Then we format it as an hclust
object, which we then feed into the scales. The scales should (in theory) automatically sort the variables according to the clustering.
I'm just adding the dendrograms at every side of the plot, so you can choose which one you really want.
# Plot prep: making the distance and hclust objects
clust <- as.hclust(h.ori$rowDendrogram)
df <- reshape2::melt(cor(t(x)))
ggplot(df, aes(Var1, Var2, fill = value)) +
geom_raster() +
scale_fill_gradient2(low = "red", mid = "white", high = "yellow")+
scale_x_dendrogram(hclust = clust) +
scale_y_dendrogram(hclust = clust) +
guides(
x.sec = guide_dendro(dendro = ggdendro::dendro_data(clust), position = "top"),
y.sec = guide_dendro(dendro = ggdendro::dendro_data(clust), position = "right")
) +
coord_equal()
Caveat is still that there is no good control over the labels yet. Let me know if you run into any troubles with the function so I can maybe improve it.
Good luck!