I have a data table where I would like to plot the sequence logo based on my Input data.
Input:
data <- data.frame(
Cns = c("H", "H", "H", "Q", "D", "D", "I", "S", "M", "P"),
variable = c("H", "Q", "R", "Q", "D", "N", "I", "S", "M", "P"),
rate = c(99.1, 0.236, 0.708, 100, 99.3, 0.708, 100, 100, 100, 100)
)
How can I draw a logo plot based on the Input (not alignment files), while having "Cns" on the x-axis, "rate" on the y-axis and a "variable" column as the logo and their size change based on the rate column?
This is a bit of a faff. There was a package for creating sequence logos in R, but it was removed from CRAN last month. You can install and load the latest working version by doing:
# install('devtools') # Uncomment this line if you don't have devtools installed
devtools::install_version('ggseqlogo', '0.1')
library(ggseqlogo)
You then need to get your data into matrix format, which requires a bit of manipulation:
data <- data.frame(
Cns = c("H", "H", "H", "Q", "D", "D", "I", "S", "M", "P"),
variable = c("H", "Q", "R", "Q", "D", "N", "I", "S", "M", "P"),
rate = c(99.1, 0.236, 0.708, 100, 99.3, 0.708, 100, 100, 100, 100)
)
df <- expand.grid(Cns = unique(data$Cns), variable = unique(data$variable))
df$rate <- unlist(Map(function(x, y) {
i <- which(data$Cns == x & data$variable == y)
if(length(i) == 0) return(0) else sum(data$rate[i])
}, df$Cns, df$variable))
mat <- matrix(df$rate, nrow = length(unique(data$variable)), byrow = TRUE,
dimnames = list(unique(data$variable), unique(data$Cns)))
If you want a colorful result that plots the letter heights according to rate
, you can then do:
p <- ggseqlogo(mat, method = 'custom', seq_type = 'other')
p$layers[[1]]$mapping <- aes(x, y, fill = letter, group = group_by)
p + scale_fill_discrete() +
scale_x_continuous(breaks = seq_along(unique(data$Cns)),
labels = unique(data$Cns))