rggplot2colorsreshape2

Stacked bar chart of relative abundance with one colour for higher taxonomic rank and gradient colour


This question is very similar to some other questions on here but I can't crack it. The issue comes as I need to reshape my data.

I have count data from microbiome data and want to make a stacked bar chart. The charts are then grouped according to a qualitative variable. I would like the higher taxonomic groups to be a certain colour and there be a continuous gradient within those groups. Similar to this:

enter image description here

I have been following these two questions: How to creat a bar graph of microbiota data with one color for higher taxonomic rank and gradient color and Stacked barplot with colour gradients for each bar

Here is an example of my data:

  ID Group Family3 Family4 Family5 Family6 Family7 Family8 Family9 Family10
1  1     1      38      73      60      20      33      71      83      40
2  2     1      96      16      88      23      19      70      44      77
3  3     2      69      99      80      60      55      76      99      92
4  4     2      82      91      91      71      79      79      12      38
5  5     3      41      83      77      84      70      37      79      92

I have IDs, group and then the various families. My dataset is has more columns/row. Script to make example data:

# Set seed for reproducibility
set.seed(123)

# Create the data frame
df <- data.frame(
  ID = 1:5,
  Group = c(1, 1, 2, 2, 3)
)

# Add columns Family3 to Family10 with random values between 0 and 100
for (i in 3:10) {
  df[[paste0("Family", i)]] <- sample(0:100, nrow(df), replace = TRUE)
}

# Print the resulting data frame
print(df)

I have a separate dataframe with the Phylum and Family information:

df_taxa <- data.frame(Phylum=c("Phyla1", "Phyla2", "Phyla3", "Phyla2", "Phyla2", "Phyla2", "Phyla1", "Phyla3", "Phyla1", "Phyla1"), Family=c("Family1", "Family8", "Family9", "Family2", "Family7", "Family6", "Family10", "Family3", "Family5", "Family4"))

There are some steps I do beforehand to remove columns with low counts and filter the df_taxa so it only contains the Phylum/Family info from the columns that remain after removing the low count columns.

This is the script I have been using the generate my stacked bar charts:

library(reshape2)
library(ggplot2)
df_melt <- reshape2::melt(df,id.vars=c("ID", "Group")) #reshape dataframe for ggplot 

df_cols <- ColourPalleteMulti(df_taxa, "Phylum", "Family") # Generate colours. This function is found in the second link. 

ggplot(df_melt, aes(ID,value, fill=variable)) + geom_bar(position="fill", stat="identity") + scale_fill_manual("", values=df_cols) + facet_grid_paginate(. ~ Group, scales ="free") #Plot with ggplot

This is what the plot looks like:

The issue is that it is not splitting the colours according to the Phyla. I have looked at the other questions and it says that it is easier to add an additional column called group to the original dataframe, then this is used as the fill option:

#Example given from second link
df$group <- paste0(df$color, "-", df$clarity, sep = "")

# Build the colour pallete
colours <-ColourPalleteMulti(df, "color", "clarity")

# Plot resultss
ggplot(df, aes(color)) + 
  geom_bar(aes(fill = group), colour = "grey") +
  scale_fill_manual("Subject", values=colours, guide = "none")

I don't see how I can do this as I melt the data and I use the count data variable as the fill option for ggplot.

Any help would be greatly appreciated. Thanks enter image description here


Solution

  • I understand you want to create subpallets for your data based on the group Phylum and color the pertaining families within each Phylum with seperate palettes.

    For this you could

    1. Create a column phylum_family that combines Phylum and Family in your df_melt
    2. Do the same in your df_taxa
    3. Order df_taxa by phylum_family
    4. Plot df_melt and fill by phylum_family
    5. scale_fill_manual by the custom color palette created with the combinations in df_taxa

    and this will give

    out

    Code

    library(reshape2)
    library(ggplot2)
    library(ggforce)
    library(dplyr)
    
    df <- data.frame(
      ID = 1:5,
      Group = c(1, 1, 2, 2, 3)
    )
    # Add columns Family3 to Family10 with random values between 0 and 100
    for (i in 3:10) {
      df[[paste0("Family", i)]] <- sample(0:100, nrow(df), replace = TRUE)
    }
    df_taxa <- data.frame(Phylum=c("Phyla1", "Phyla2", "Phyla3", "Phyla2", "Phyla2", "Phyla2", "Phyla1", "Phyla3", "Phyla1", "Phyla1"), 
                          Family=c("Family1", "Family8", "Family9", "Family2", "Family7", "Family6", "Family10", "Family3", "Family5", "Family4"))
    
    df_melt <- reshape2::melt(df, id.vars=c("ID", "Group")) %>%
      left_join(df_taxa[,c("Family", "Phylum")], by = c("variable" = "Family")) %>%
      mutate(phylum_family = paste(Phylum, variable, sep = "-"))
    
    
    # color pallet multi
    
    ColourPalleteMulti <- function(df, group, subgroup){
      
      # Find how many colour categories to create and the number of colours in each
      categories <- aggregate(as.formula(paste(subgroup, group, sep="~" )), df, function(x) length(unique(x)))
      category.start <- (scales::hue_pal(l = 100)(nrow(categories))) # Set the top of the colour pallete
      category.end  <- (scales::hue_pal(l = 40)(nrow(categories))) # set the bottom
      
      # Build Colour pallette
      colours <- unlist(lapply(1:nrow(categories),
                               function(i){
                                 colorRampPalette(colors = c(category.start[i], category.end[i]))(categories[i,2])}))
    }
    
    
    # We'll still use ColourPalleteMulti but now on our mapping dataframe
    df_taxa$phylum_family <- paste(df_taxa$Phylum, df_taxa$Family, sep = "-")
    
    df_taxa <- arrange(df_taxa, phylum_family) # order
    df_cols <- setNames(ColourPalleteMulti(df_taxa, "Phylum", "Family"), df_taxa$phylum_family)
    
    # Now plot with the combined phylum-family as the fill
    ggplot(df_melt, aes(ID, value, fill = phylum_family)) +
      geom_bar(position = "fill", stat = "identity") +
      scale_fill_manual("", values = df_cols) +
      facet_grid_paginate(. ~ Group, scales = "free")
    

    Let me know, if any of this needs further explanation or if I misundertood you.

    Adding brackets

    I found this for grouping legends in ggplot. But you can also improvise some brackets by using cowplot. This can be improved as it's very manual atm.

    p <- ggplot(df_melt, aes(ID, value, fill = phylum_family)) +
      geom_bar(position = "fill", stat = "identity") +
      scale_fill_manual("", values = df_cols) +
      facet_grid_paginate(. ~ Group, scales = "free") +
      theme(plot.margin = margin(5, 80, 5, 5, "pt")) 
    
    library(cowplot)
    p <- ggdraw(p)
    
    add_taxonomic_bracket <- function(plot, label, color, y_min, y_max, 
                                      x_bracket = 0.91, bracket_width = 0.02, 
                                      label_offset = 0.03, size = 1) {
      x_end <- x_bracket - bracket_width
      y_mid <- (y_min + y_max) / 2
      
      plot + 
        draw_line(
          x = c(x_bracket, x_bracket), 
          y = c(y_min, y_max),
          color = color, 
          size = size
        ) +
        draw_line(
          x = c(x_bracket, x_end), 
          y = c(y_max, y_max),
          color = color, 
          size = size
        ) +
        draw_line(
          x = c(x_bracket, x_end), 
          y = c(y_min, y_min),
          color = color, 
          size = size
        ) +
        draw_label(
          label, 
          x = x_end, 
          y = y_mid, 
          color = color,
          hjust = -0.6,
          fontface = "italic"
        )
    }
    
    p <- add_taxonomic_bracket(p, "Phyla 1", "#E65100", 0.53, 0.62)
    p <- add_taxonomic_bracket(p, "Phyla 2", "#388E3C", 0.42, 0.53)
    p <- add_taxonomic_bracket(p, "Phyla 3", "#1565C0", 0.36, 0.42)
    p
    
    

    out