rggplot2graphical-logo

how draw sequence logo plot based on data table in R?


I have a data table where I would like to plot the sequence logo based on my Input data.

Input:

data <- data.frame(
Cns = c("H", "H", "H", "Q", "D", "D", "I", "S", "M", "P"),
variable = c("H", "Q", "R", "Q", "D", "N", "I", "S", "M", "P"),
rate = c(99.1, 0.236, 0.708, 100, 99.3, 0.708, 100, 100, 100, 100)
)

How can I draw a logo plot based on the Input (not alignment files), while having "Cns" on the x-axis, "rate" on the y-axis and a "variable" column as the logo and their size change based on the rate column?


Solution

  • This is a bit of a faff. There was a package for creating sequence logos in R, but it was removed from CRAN last month. You can install and load the latest working version by doing:

    # install('devtools') # Uncomment this line if you don't have devtools installed
    devtools::install_version('ggseqlogo', '0.1')
    library(ggseqlogo)
    

    You then need to get your data into matrix format, which requires a bit of manipulation:

    data <- data.frame(
      Cns = c("H", "H", "H", "Q", "D", "D", "I", "S", "M", "P"),
      variable = c("H", "Q", "R", "Q", "D", "N", "I", "S", "M", "P"),
      rate = c(99.1, 0.236, 0.708, 100, 99.3, 0.708, 100, 100, 100, 100)
    )
    
    df <- expand.grid(Cns = unique(data$Cns), variable = unique(data$variable))
    
    df$rate <- unlist(Map(function(x, y) {
      i <- which(data$Cns == x & data$variable == y)
      if(length(i) == 0) return(0) else sum(data$rate[i])
    }, df$Cns, df$variable))
      
    mat <- matrix(df$rate, nrow = length(unique(data$variable)), byrow = TRUE,
                  dimnames = list(unique(data$variable), unique(data$Cns)))
    

    If you want a colorful result that plots the letter heights according to rate, you can then do:

    p <- ggseqlogo(mat, method = 'custom', seq_type = 'other') 
    p$layers[[1]]$mapping <- aes(x, y, fill = letter, group = group_by)
    p + scale_fill_discrete() +
      scale_x_continuous(breaks = seq_along(unique(data$Cns)),
                         labels = unique(data$Cns))
    

    enter image description here