This question is very similar to some other questions on here but I can't crack it. The issue comes as I need to reshape my data.
I have count data from microbiome data and want to make a stacked bar chart. The charts are then grouped according to a qualitative variable. I would like the higher taxonomic groups to be a certain colour and there be a continuous gradient within those groups. Similar to this:
I have been following these two questions: How to creat a bar graph of microbiota data with one color for higher taxonomic rank and gradient color and Stacked barplot with colour gradients for each bar
Here is an example of my data:
ID Group Family3 Family4 Family5 Family6 Family7 Family8 Family9 Family10
1 1 1 38 73 60 20 33 71 83 40
2 2 1 96 16 88 23 19 70 44 77
3 3 2 69 99 80 60 55 76 99 92
4 4 2 82 91 91 71 79 79 12 38
5 5 3 41 83 77 84 70 37 79 92
I have IDs, group and then the various families. My dataset is has more columns/row. Script to make example data:
# Set seed for reproducibility
set.seed(123)
# Create the data frame
df <- data.frame(
ID = 1:5,
Group = c(1, 1, 2, 2, 3)
)
# Add columns Family3 to Family10 with random values between 0 and 100
for (i in 3:10) {
df[[paste0("Family", i)]] <- sample(0:100, nrow(df), replace = TRUE)
}
# Print the resulting data frame
print(df)
I have a separate dataframe with the Phylum and Family information:
df_taxa <- data.frame(Phylum=c("Phyla1", "Phyla2", "Phyla3", "Phyla2", "Phyla2", "Phyla2", "Phyla1", "Phyla3", "Phyla1", "Phyla1"), Family=c("Family1", "Family8", "Family9", "Family2", "Family7", "Family6", "Family10", "Family3", "Family5", "Family4"))
There are some steps I do beforehand to remove columns with low counts and filter the df_taxa so it only contains the Phylum/Family info from the columns that remain after removing the low count columns.
This is the script I have been using the generate my stacked bar charts:
library(reshape2)
library(ggplot2)
df_melt <- reshape2::melt(df,id.vars=c("ID", "Group")) #reshape dataframe for ggplot
df_cols <- ColourPalleteMulti(df_taxa, "Phylum", "Family") # Generate colours. This function is found in the second link.
ggplot(df_melt, aes(ID,value, fill=variable)) + geom_bar(position="fill", stat="identity") + scale_fill_manual("", values=df_cols) + facet_grid_paginate(. ~ Group, scales ="free") #Plot with ggplot
This is what the plot looks like:
The issue is that it is not splitting the colours according to the Phyla. I have looked at the other questions and it says that it is easier to add an additional column called group
to the original dataframe, then this is used as the fill
option:
#Example given from second link
df$group <- paste0(df$color, "-", df$clarity, sep = "")
# Build the colour pallete
colours <-ColourPalleteMulti(df, "color", "clarity")
# Plot resultss
ggplot(df, aes(color)) +
geom_bar(aes(fill = group), colour = "grey") +
scale_fill_manual("Subject", values=colours, guide = "none")
I don't see how I can do this as I melt the data and I use the count data variable
as the fill option for ggplot
.
I understand you want to create subpallets for your data based on the group Phylum
and color the pertaining families within each Phylum with seperate palettes.
For this you could
phylum_family
that combines Phylum and Family in your df_melt
df_taxa
df_taxa
by phylum_family
df_melt
and fill by phylum_family
scale_fill_manual
by the custom color palette created with the combinations in df_taxa
and this will give
library(reshape2)
library(ggplot2)
library(ggforce)
library(dplyr)
df <- data.frame(
ID = 1:5,
Group = c(1, 1, 2, 2, 3)
)
# Add columns Family3 to Family10 with random values between 0 and 100
for (i in 3:10) {
df[[paste0("Family", i)]] <- sample(0:100, nrow(df), replace = TRUE)
}
df_taxa <- data.frame(Phylum=c("Phyla1", "Phyla2", "Phyla3", "Phyla2", "Phyla2", "Phyla2", "Phyla1", "Phyla3", "Phyla1", "Phyla1"),
Family=c("Family1", "Family8", "Family9", "Family2", "Family7", "Family6", "Family10", "Family3", "Family5", "Family4"))
df_melt <- reshape2::melt(df, id.vars=c("ID", "Group")) %>%
left_join(df_taxa[,c("Family", "Phylum")], by = c("variable" = "Family")) %>%
mutate(phylum_family = paste(Phylum, variable, sep = "-"))
# color pallet multi
ColourPalleteMulti <- function(df, group, subgroup){
# Find how many colour categories to create and the number of colours in each
categories <- aggregate(as.formula(paste(subgroup, group, sep="~" )), df, function(x) length(unique(x)))
category.start <- (scales::hue_pal(l = 100)(nrow(categories))) # Set the top of the colour pallete
category.end <- (scales::hue_pal(l = 40)(nrow(categories))) # set the bottom
# Build Colour pallette
colours <- unlist(lapply(1:nrow(categories),
function(i){
colorRampPalette(colors = c(category.start[i], category.end[i]))(categories[i,2])}))
}
# We'll still use ColourPalleteMulti but now on our mapping dataframe
df_taxa$phylum_family <- paste(df_taxa$Phylum, df_taxa$Family, sep = "-")
df_taxa <- arrange(df_taxa, phylum_family) # order
df_cols <- setNames(ColourPalleteMulti(df_taxa, "Phylum", "Family"), df_taxa$phylum_family)
# Now plot with the combined phylum-family as the fill
ggplot(df_melt, aes(ID, value, fill = phylum_family)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_manual("", values = df_cols) +
facet_grid_paginate(. ~ Group, scales = "free")
Let me know, if any of this needs further explanation or if I misundertood you.
I found this for grouping legends in ggplot. But you can also improvise some brackets by using cowplot
. This can be improved as it's very manual atm.
p <- ggplot(df_melt, aes(ID, value, fill = phylum_family)) +
geom_bar(position = "fill", stat = "identity") +
scale_fill_manual("", values = df_cols) +
facet_grid_paginate(. ~ Group, scales = "free") +
theme(plot.margin = margin(5, 80, 5, 5, "pt"))
library(cowplot)
p <- ggdraw(p)
add_taxonomic_bracket <- function(plot, label, color, y_min, y_max,
x_bracket = 0.91, bracket_width = 0.02,
label_offset = 0.03, size = 1) {
x_end <- x_bracket - bracket_width
y_mid <- (y_min + y_max) / 2
plot +
draw_line(
x = c(x_bracket, x_bracket),
y = c(y_min, y_max),
color = color,
size = size
) +
draw_line(
x = c(x_bracket, x_end),
y = c(y_max, y_max),
color = color,
size = size
) +
draw_line(
x = c(x_bracket, x_end),
y = c(y_min, y_min),
color = color,
size = size
) +
draw_label(
label,
x = x_end,
y = y_mid,
color = color,
hjust = -0.6,
fontface = "italic"
)
}
p <- add_taxonomic_bracket(p, "Phyla 1", "#E65100", 0.53, 0.62)
p <- add_taxonomic_bracket(p, "Phyla 2", "#388E3C", 0.42, 0.53)
p <- add_taxonomic_bracket(p, "Phyla 3", "#1565C0", 0.36, 0.42)
p