I am new to R and I need to create a stacked bar chart which shows percent within bars (stacks) and goes from 0 to 100 percent. I have over 7000 rows and 12 vectors that range from value 1 trough 5 (the value 9999 is to be excluded from the chart). For each vector I have calculated a percentage over the 5 values, so that each bar in the chart should sum to 100 %.
The problem I encounter is this: Two of the vectors sum to 99 % and one vector sums to 101 %, while the rest sum to 100 as expected. The chart therefore has two bars/rows that are a bit shorter and one bar/row that is a bit longer than the rest of the bars. This problem disappears if I round to 1 or 2 decimal places, rather than asking for 0 decimal places (which, sadly, is what I have to do).
Is there any way to force ggplot to show bars/rows of identical length (100 %) ?
Any help is greatly appreciated!
I removed the two images of my data and the ggplot. I have produced a working example which simulates my data and shows how bars/rows differ in lenght when I ask for 0 decimal places.
I am well aware that this issue could be solved by asking for 1 decimal place. My employer wants 0, which is why I hope to find a solution for this.
library(reprex)
library(ggplot2)
library(tidyverse)
# Creating some data
item1 <- floor(runif(7200, min=1, max=6))
item2 <- floor(runif(7200, min=1, max=6))
item3 <- floor(runif(7200, min=1, max=6))
item4 <- floor(runif(7200, min=1, max=6))
item5 <- floor(runif(7200, min=1, max=6))
item6 <- floor(runif(7200, min=1, max=6))
item7 <- floor(runif(7200, min=1, max=6))
item8 <- floor(runif(7200, min=1, max=6))
item9 <- floor(runif(7200, min=1, max=6))
item10 <- floor(runif(7200, min=1, max=6))
item11 <- floor(runif(7200, min=1, max=6))
item12 <- floor(runif(7200, min=1, max=6))
df <- data.frame(item1, item2, item3, item4, item5, item6, item7, item8, item9, item10, item11, item12)
# Drawing the ggplot with 0 (zero) decimal places
df %>%
gather %>%
group_by(key, value) %>%
tally %>%
mutate(n = round(n/sum(n)*100,0)) %>%
ggplot(aes(x=key, y=n, fill=as.factor(value))) +
geom_col(position = position_stack(reverse = T)) +
labs(title = "Some title", x = " ", y = "%", fill = " ") +
geom_text(aes(label=n), position=position_stack(reverse = TRUE, vjust = 0.5), size = 3, colour = "white") +
theme(legend.position="top") +
coord_flip() +
theme_minimal()
Created on 2022-04-06 by the reprex package (v2.0.1)
A very simple solution to your problem is to change from position = position_stack()
to position = position_fill()
.
library(tidyverse)
library(scales)
# Creating some data
df <- data.frame(matrix(floor(runif(12*7200, 1, 6)), ncol = 12, dimnames = list(NULL, paste0("item", 1:12))))
# Drawing the ggplot with 0 (zero) decimal places
df %>%
gather %>%
group_by(key, value) %>%
tally %>%
mutate(n = round(n/sum(n) * 100, 0)) %>%
ggplot(aes(x = key, y = n, fill = as.factor(value))) +
geom_col(position = position_fill(reverse = T)) +
labs(title = "Some title", x = " ", fill = " ") +
scale_y_continuous(name = "%", labels = percent) +
geom_text(aes(label = n), position = position_fill(reverse = TRUE, vjust = 0.5), size = 3, colour = "white") +
theme(legend.position = "top") +
coord_flip() +
theme_minimal()
Created on 2022-04-06 by the reprex package (v2.0.1)