This is the data I'm working with:
Station | Salinity | CentricD | PennateD | Dinoflag | MarineFlag | Cilliates |
---|---|---|---|---|---|---|
A3 | 18.3 | 181000 | 26500 | 1000 | 15500 | 2250 |
A6 | 27.4 | 584666.6667 | 4666.666667 | 11666.66667 | 0 | 61333.33333 |
A8 | 25.7 | 625071.4286 | 2000 | 74000 | 294.1176471 | 1907.563025 |
B | 29.77785714 | 503693.8776 | 2000 | 6642.857143 | 7642.857143 | 5622.44898 |
C | 31.283 | 266991.5966 | 5285.714286 | 10714.28571 | 71352.94118 | 12067.22689 |
D | 32.21625 | 349375 | 6437.5 | 6142.857143 | 39651.78571 | 4339.285714 |
E | 32.23 | 379200 | 466.6666667 | 3714.285714 | 12228.57143 | 4504.761905 |
F | 32.8 | 559000 | 0 | 333.3333333 | 0 | 11000 |
G | 33.185 | 209276.7857 | 2125 | 5714.285714 | 27937.5 | 3062.5 |
H | 33.67 | 98714.28571 | 1812.5 | 7125 | 6410.714286 | 7750 |
I | 34.33294118 | 113302.521 | 1764.705882 | 40142.85714 | 5588.235294 | 9260.504202 |
J | 34.537 | 68142.85714 | 1000 | 12842.85714 | 20228.57143 | 5271.428571 |
I want to make a stacked barchart, with 'Station' on the x-axis, and then each type of phytoplankton stacked on top of each other per station to create a comprehensive idea of both how many phytoplankton there are per station and what that composition is made up of. I just don't know how to do that. Looking at the geom_bar() command, I need to specify a 'fill' variable, of which I don't have just one, I have 5 types of phytoplankton that I want to fill it with.
I'm sure that this is just a data formatting issue, but I can't find any examples of how to properly format it. Thanks in advance.
You would first have to pivot the data to be in long format, then you could make the graph using the pivoted values as the y-axis values and the pivoted variable names as the fill
variable. Here's an example.
library(dplyr)
library(tidyr)
library(ggplot2)
dat <- tibble::tribble(
~Station , ~Salinity , ~CentricD , ~PennateD , ~Dinoflag , ~MarineFlag , ~Cilliates ,
"A3" , 18.3 , 181000 , 26500 , 1000 , 15500 , 2250 ,
"A6" , 27.4 , 584666.6667 , 4666.666667 , 11666.66667 , 0 , 61333.33333 ,
"A8" , 25.7 , 625071.4286 , 2000 , 74000 , 294.1176471 , 1907.563025 ,
"B" , 29.77785714 , 503693.8776 , 2000 , 6642.857143 , 7642.857143 , 5622.44898 ,
"C" , 31.283 , 266991.5966 , 5285.714286 , 10714.28571 , 71352.94118 , 12067.22689 ,
"D" , 32.21625 , 349375 , 6437.5 , 6142.857143 , 39651.78571 , 4339.285714 ,
"E" , 32.23 , 379200 , 466.6666667 , 3714.285714 , 12228.57143 , 4504.761905 ,
"F" , 32.8 , 559000 , 0 , 333.3333333 , 0 , 11000 ,
"G" , 33.185 , 209276.7857 , 2125 , 5714.285714 , 27937.5 , 3062.5 ,
"H" , 33.67 , 98714.28571 , 1812.5 , 7125 , 6410.714286 , 7750 ,
"I" , 34.33294118 , 113302.521 , 1764.705882 , 40142.85714 , 5588.235294 , 9260.504202 ,
"J" , 34.537 , 68142.85714 , 1000 , 12842.85714 , 20228.57143 , 5271.428571 )
Here, I use pivot_longer()
from tidyr
. This will plot the raw values by station and phytoplankton. Note, that if you are providing the y value directly (and not calculating it from the data), you need to use stat="identity"
in geom_bar()
.
dat %>%
pivot_longer(CentricD:Cilliates, names_to = "phyto", values_to = "val") %>%
ggplot(aes(x=Station, y=val, fill = phyto)) +
geom_bar(stat="identity") +
theme_bw()
If you would rather percentagize the figures so each bar has the same height, you could make the percentage variable by Station
first and then plot that variable instead.
dat %>%
pivot_longer(CentricD:Cilliates, names_to = "phyto", values_to = "val") %>%
group_by(Station) %>%
mutate(pct = val/sum(val)) %>%
ggplot(aes(x=Station, y=pct, fill = phyto)) +
geom_bar(stat="identity") +
theme_bw()
Created on 2024-12-04 with reprex v2.1.0