rdataframeggplot2

How can I create a stacked barchart with this data using ggplot2?


This is the data I'm working with:

Station Salinity CentricD PennateD Dinoflag MarineFlag Cilliates
A3 18.3 181000 26500 1000 15500 2250
A6 27.4 584666.6667 4666.666667 11666.66667 0 61333.33333
A8 25.7 625071.4286 2000 74000 294.1176471 1907.563025
B 29.77785714 503693.8776 2000 6642.857143 7642.857143 5622.44898
C 31.283 266991.5966 5285.714286 10714.28571 71352.94118 12067.22689
D 32.21625 349375 6437.5 6142.857143 39651.78571 4339.285714
E 32.23 379200 466.6666667 3714.285714 12228.57143 4504.761905
F 32.8 559000 0 333.3333333 0 11000
G 33.185 209276.7857 2125 5714.285714 27937.5 3062.5
H 33.67 98714.28571 1812.5 7125 6410.714286 7750
I 34.33294118 113302.521 1764.705882 40142.85714 5588.235294 9260.504202
J 34.537 68142.85714 1000 12842.85714 20228.57143 5271.428571

I want to make a stacked barchart, with 'Station' on the x-axis, and then each type of phytoplankton stacked on top of each other per station to create a comprehensive idea of both how many phytoplankton there are per station and what that composition is made up of. I just don't know how to do that. Looking at the geom_bar() command, I need to specify a 'fill' variable, of which I don't have just one, I have 5 types of phytoplankton that I want to fill it with.

I'm sure that this is just a data formatting issue, but I can't find any examples of how to properly format it. Thanks in advance.


Solution

  • You would first have to pivot the data to be in long format, then you could make the graph using the pivoted values as the y-axis values and the pivoted variable names as the fill variable. Here's an example.

    Original Data

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    dat <- tibble::tribble(
      ~Station , ~Salinity    , ~CentricD    , ~PennateD    , ~Dinoflag    , ~MarineFlag  , ~Cilliates   ,
    "A3"      , 18.3        , 181000      , 26500       , 1000        , 15500       , 2250        ,
    "A6"      , 27.4        , 584666.6667 , 4666.666667 , 11666.66667 , 0           , 61333.33333 ,
    "A8"      , 25.7        , 625071.4286 , 2000        , 74000       , 294.1176471 , 1907.563025 ,
    "B"       , 29.77785714 , 503693.8776 , 2000        , 6642.857143 , 7642.857143 , 5622.44898  ,
    "C"       , 31.283      , 266991.5966 , 5285.714286 , 10714.28571 , 71352.94118 , 12067.22689 ,
    "D"       , 32.21625    , 349375      , 6437.5      , 6142.857143 , 39651.78571 , 4339.285714 ,
    "E"       , 32.23       , 379200      , 466.6666667 , 3714.285714 , 12228.57143 , 4504.761905 ,
    "F"       , 32.8        , 559000      , 0           , 333.3333333 , 0           , 11000       ,
    "G"       , 33.185      , 209276.7857 , 2125        , 5714.285714 , 27937.5     , 3062.5      ,
    "H"       , 33.67       , 98714.28571 , 1812.5      , 7125        , 6410.714286 , 7750        ,
    "I"       , 34.33294118 , 113302.521  , 1764.705882 , 40142.85714 , 5588.235294 , 9260.504202 ,
    "J"       , 34.537      , 68142.85714 , 1000        , 12842.85714 , 20228.57143 , 5271.428571 )
    

    Here, I use pivot_longer() from tidyr. This will plot the raw values by station and phytoplankton. Note, that if you are providing the y value directly (and not calculating it from the data), you need to use stat="identity" in geom_bar().

    dat %>% 
      pivot_longer(CentricD:Cilliates, names_to = "phyto", values_to = "val") %>% 
      ggplot(aes(x=Station, y=val, fill = phyto)) + 
      geom_bar(stat="identity") + 
      theme_bw()
    

    If you would rather percentagize the figures so each bar has the same height, you could make the percentage variable by Station first and then plot that variable instead.

    
    dat %>% 
      pivot_longer(CentricD:Cilliates, names_to = "phyto", values_to = "val") %>% 
      group_by(Station) %>% 
      mutate(pct = val/sum(val)) %>% 
      ggplot(aes(x=Station, y=pct, fill = phyto)) + 
      geom_bar(stat="identity") + 
      theme_bw()
    

    Created on 2024-12-04 with reprex v2.1.0