rmatrixcorrelation

convert source target value dataframe into a correlation matrix


I have a data.frame containing different combinations of a group and count values where both of them exist. I need to plot or create a matrix similar to correlation matrix. I have come up with a simple example here

dat <- data.frame(source = c('A','A','A','B','B','C'),
              target = c('B','C','D','C','D','D'),
              count = c(4,5,6,3,3,5))

> dat
  source target count
1      A      B     4
2      A      C     5
3      A      D     6
4      B      C     3
5      B      D     3
6      C      D     5

How do I get a matrix like this?? and plot this matrix enter image description here


Solution

  • This uses dplyr, tapply and pheatmap:

    library(dplyr)
    
    by <- c("source", "target")
    m <- dat %>%
      mutate(across(any_of(by), \(x) factor(x, sort(unique(unlist(.[by])))))) %>%
      { tapply(.$count, .[by], c, default = 0) } %>%
      { . + t(.) + diag(ncol(.)) }
    m
    

    giving

          target
    source A B C D
         A 1 4 5 6
         B 4 1 3 3
         C 5 3 1 5
         D 6 3 5 1
    
    
    library(pheatmap)
    pheatmap(m, display_numbers = TRUE, cluster_rows = FALSE, cluster_cols = FALSE)
    

    (continued after image) screenshot

    We can also try the Bioconductor ComplexHeatmap package.

    library(circlize)
    library(ComplexHeatmap)
    library(RColorBrewer)
    
    col_fun <-  colorRamp2(c(0, 3, 6), brewer.pal(n = 3, name = "RdYlBu"))
    Heatmap(m, name = "m", col = col_fun,
      cluster_rows = FALSE,
      cluster_columns = FALSE,
      column_names_rot = 0,
      row_names_gp = gpar(fontsize = 20),
      column_names_gp = gpar(fontsize = 20),
      cell_fun = function(j, i, x, y, width, height, fill) {
        grid.text(sprintf("%.0f", m[i, j]), x, y, gp = gpar(fontsize = 30))})
    

    (continued after image) screenshot

    or without color

    Heatmap(m, name = "m", 
      rect_gp = gpar(type = "none"),
      cluster_rows = FALSE,
      cluster_columns = FALSE,
      column_names_rot = 0,
      row_names_gp = gpar(fontsize = 20),
      column_names_gp = gpar(fontsize = 20),
      show_heatmap_legend = FALSE,
      cell_fun = function(j, i, x, y, width, height, fill) {
        grid.rect(x = x, y = y, width = width, height = height, gp = gpar(fill = NA))
        grid.text(sprintf("%.0f", m[i, j]), x, y, gp = gpar(fontsize = 30))})
    

    (continued after image) screenshot

    Another possibility is a balloonplot using the gplots package.

    library(gplots)
    balloonplot(as.table(t(m)), show.margins = FALSE, cum.margins = FALSE, main = "m")
    

    (continued after image) screenshot

    The ggpubr package also has a balloon plot function:

    library(ggpubr)
    
    ggballoonplot(m, show.label = TRUE, rotate.x.text = 0)
    

    screenshot