rggplot2colorscategoriesstacked-area-chart

assign colors to categories in stacked area graph ggplot


I have to do several stacked area graphs in R with a common list of categories, but all the categories won't be present in all the graphs. So I created a vector assigning a colour to each category and used it in scale_fill_manual. It seems to work fine but the first category remains blanked. Anyone who know how to solve it?

An example (sort) of my data and the code I have used:

df <- structure(list(t = c(4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 
4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 
8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 
12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12, 4, 8, 12
), Orden = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 5L, 5L, 5L, 6L, 
6L, 6L, 10L, 10L, 10L, 11L, 11L, 11L, 12L, 12L, 12L, 13L, 13L, 
13L, 14L, 14L, 14L, 15L, 15L, 15L, 7L, 7L, 7L, 8L, 8L, 8L, 16L, 
16L, 16L, 17L, 17L, 17L, 18L, 18L, 18L, 19L, 19L, 19L, 21L, 21L, 
21L, 22L, 22L, 22L, 23L, 23L, 23L, 24L, 24L, 24L, 25L, 25L, 25L, 
27L, 27L, 27L, 28L, 28L, 28L), .Label = c("Chaetocerotanae incertae sedis", 
"Corethrales", "Coscinodiscales", "Fragilariales", "Leptocylindrales", 
"Licmophorales", "Melosirales", "Naviculales", "Rhaphoneidales", 
"Rhizosoleniales", "Surirellales", "Thalassionematales", "Thalassiosirales", 
"Triceratiales", "Otras diatomeas centrales", "Otras células o cadenas lineales", 
"Cadenas de células pequeñas", "Otras diatomeas pennadas", "Otras diatomeas", 
"Actiniscales", "Dinophysiales", "Gonyaulacales", "Hemiaulales", 
"Noctilucales", "Peridiniales", "Prorocentrales", "Pyrocystales", 
"Otros dinoflagelados", "Appendicularia", "Choreotrichida", "Ciliophora", 
"Cirripedia", "Coccolithophores", "Copepods", "Cyanobacteria", 
"Dictyochales", "Fish egg", "Others", "Radiozoa", "Tintinnids", 
"Foraminifera"), class = "factor"), percentage = c(0.001, 0.002, 
0.005, 0, 0, 0, 0, 0.003, 0.001, 0.003, 0, 0, 0.033, 0.373, 0.169, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.004, 0.017, 0.015, 0, 0, 
0, 0, 0, 0.002, 0.239, 0.245, 0.282, 0.681, 0.243, 0.382, 0.018, 
0, 0.039, 0, 0.001, 0, 0, 0.001, 0, 0.003, 0.044, 0.007, 0, 0, 
0, 0, 0.001, 0, 0.003, 0.019, 0.002, 0, 0, 0, 0.016, 0.051, 0.096
)), row.names = c(NA, -69L), class = c("tbl_df", "tbl", "data.frame"
))

orders_colours <- c("Chaetocerotanae incertae sedis" = "#595959", "Corethrales" = "#A6A6A6", "Coscinodiscales" = "#D9D9D9", "Fragilariales" = "#C5B8D0", "Leptocylindrales" = "#A18BB3", "Licmophorales" = "#775C8E", "Melosirales" = "#533569", "Naviculales" = "#251642", "Rhaphoneidales" = "#0A1E3E", "Rhizosoleniales" = "#123A74", "Surirellales" = "#1D81A2", "Thalassionematales" = "#004765", "Thalassiosirales" = "#53AAC9", "Triceratiales" = "#47C1B2", "Otras diatomeas centrales" = "#339A9B", "Otras células o cadenas lineales" = "#1C5558", "Cadenas de células pequeñas" = "#257085", "Otras diatomeas pennadas" = "#005D71", "Otras diatomeas" = "#163E4A", "Actiniscales" = "#FFBB7F", "Dinophysiales" = "#FFDC6C", "Gonyaulacales" = "#FFFBB1", "Hemiaulales" = "#FFE59C", "Noctilucales" = "#FFA126", "Peridiniales" = "#E65340", "Prorocentrales" = "#CC3E2F", "Pyrocystales" = "#731813", "Otros dinoflagelados" = "#390B09")

ggplot(df, aes(fill=Orden, y=percentage, x=t)) + 
  geom_area() +
  scale_fill_manual( values = orders_colours) +
  theme_light (base_size = 12, base_family = "Times")+
  theme(legend.position="bottom") +
  xlab("") + 
  ylab("")

And what I get is this kind of graph. In this case the percentage of "Chaetocerotanae incertae sedis" is too smal to be evident in the graph, however, as you can see in the legend it has no colour, although it should be dark grey...

enter image description here

Thanks in advance for the help!


Solution

  • I think it would be safer to match the color within the data frame and then map via scale_identity. I feel this gives you a better control of your mapping - and you will also be able to better debug mismatches. This allows also easily for different groups to be present or not.

    library(ggplot2)
    ## as per your question
    # df <- ...
    # orders_colours <- ...
    
    ## change from here
    
    # for the label of your legend
    label_colours <- setNames(names(orders_colours), orders_colours)
    # match the colors with the respective "Orden"
    df$color <- orders_colours[df$Orden]
    
    # now change fill to color and use scale_identity
    ggplot(df, aes(fill=color, y=percentage, x=t)) + 
      geom_area() +
    ## in scale_identity you need to add the legend via guide_legend, 
    ## and set the limits for a correct legend order
      scale_fill_identity(guide = guide_legend(), limits = orders_colours, 
                          labels = label_colours ) +
      theme_light (base_size = 12, base_family = "Times")+
      theme(legend.position="bottom", 
            legend.key.size = unit(.1, "in")) +
      ## don't do this "" thing
      labs(x = NULL, y = NULL)