rstringggplot2dna-sequence

Plot letter strings in ggplot2 plot


I have a following dataframe:

readname <- c("tic", "tac", "toe")
sequence <- c("TTTTTTTTATTTTTA","TTTTCTTTTTTTTT","GTTTTTTT")

df <- data.frame(readname, sequence)

and I want to plot it using ggplot2, so the y axis would contain "readname" column and the strings stored in column "sequence" would be plotted horizontally, one below the other, all aligned to right. I would also like to assign color to every letter (e.g. A - red, T - blue, C - green, G - yellow).

I tried to use geom_text() but I produce either empty plots or plot every string on the same level (collapsed) or plotted on the various levels, unaligned. The same applies to geom_label().

One of my attempts:

ggplot2::ggplot(df, aes(x=sequence, y=readname)) + ggplot2::geom_label(label=sequence)

Here is something I would like to achieve (from text editor):

enter image description here


Solution

  • library(tidyr)
    library(dplyr)
    library(ggplot2)
    
    df |> mutate(letters = lapply(strsplit(sequence, split = ""), rev)) |>
      unnest(letters) |>
      mutate(pos = row_number(), .by = c(readname, sequence)) |>
      ggplot(aes(x = max(pos) - pos, y = readname, color = letters, label = letters)) +
      geom_text() +
      scale_color_manual(guide = "none", values = c("red3", "green4", "orange", "blue")) + 
      scale_y_discrete(limits = rev(readname)) +
      theme_minimal() +
      theme(
        panel.grid = element_blank(),
        axis.text.x = element_blank(),
        axis.title = element_blank()
      )
    

    enter image description here