These questions helped me but the solution is still not correct.
Stacked bar chart across multiple columns
My data frame:
DevType <- c('Designer', 'Developer, Back', 'Developer, front', 'Engineer')
Salary <- c(120, 340, 72, 400)
Master <- c('1', '2', '3', '4')
Bachelor <- c('6', '1', '3', '1')
University <- c('6', '2', '0', '2')
data1 <- data.frame(DevType, Salary, Master, Bachelor, University)
Because of the questions I created a list like this with:
data1 <- gather(data1, key, value, -DevType, -Salary)
DevType | Salary | key | value |
---|---|---|---|
Designer | 120 | Master | 1 |
Developer | 340 | Master | 3 |
Engineer | 72 | Master | 4 |
Student | 400 | Master | 2 |
Designer | 120 | Bachelor | 6 |
Developer | 340 | Bachelor | 8 |
Engineer | 72 | Bachelor | 2 |
Student | 400 | Bachelor | 1 |
Designer | 120 | University | 2 |
Developer | 340 | University | 3 |
Engineer | 72 | University | 4 |
Student | 400 | University | 2 |
Now I want a stacked barplot. x-axis: DevType y-axis: Salary The bars of the DevTypes are subdivide by the value. As a legend I need the key.
I have this from the questions:
ggplot(data1, aes(x = DevType, y = Salary))+
geom_col(aes(fill = key))
The difference between my question is, that I have for the y-axis not the value. The problem is the right height is only one key and the keys are all the same length.
Thanks for any pointers.
Given the back and forth in the comments, it appears that the bars on the chart should sum to the average salary, and what is desired is to see the relative contribution to the average by people with different education levels.
For example, the average salary for Developer, front
is 72, and two people contributed to the average, one with a Bachelor degree and one with a Master degree. Therefore, the bar should have a height of 72, and each person should contribute 36 to the total.
Therefore, we create adjusted salaries based on the weighted contribution to the average.
library(ggplot2)
library(tidyr)
library(dplyr)
DevType <- c('Designer', 'Developer, Back', 'Developer, front', 'Engineer')
Salary <- c(120, 340, 72, 400)
Master <- c('1', '2', '3', '4')
Bachelor <- c('6', '1', '3', '1')
University <- c('6', '2', '0', '2')
data1 <- data.frame(DevType, Salary, Master, Bachelor, University)
# gather data for subsequent processing
data1 <- data1 %>%
gather(., key, value, -DevType, -Salary) %>%
type.convert(.,as.is = TRUE)
data1 <- data1 %>%
group_by(DevType) %>%
# calculate denominators for salaries
summarise(.,salaryCount = sum(value)) %>%
# merge salary counts
left_join(.,data1) %>%
# use number of participants as denominator so sums add up to average
# salary
mutate(adjSalary = if_else(value > 0, Salary * value / salaryCount,0))
# original chart - where y axis is adjusted so total matches average salary
# across participants who contributed to the average
ggplot(data1, aes(x = DevType, y = adjSalary))+
geom_col(aes(fill = key))
...and the output, where the bars sum to the original salary levels.
A stacked bar chart is helpful when one wants to compare the varying contribution of different categories of a grouping variable to the sum of their values on the y-axis variable. However, it appears from the data that the questioner is trying to compare salary levels for different roles by level of education.
In this case a grouped bar chart is more useful than a stacked one because a grouped chart visually compares categories of a third grouping variable within categories of the x-axis variable.
library(ggplot2)
library(tidyr)
DevType <- c('Designer', 'Developer, Back', 'Developer, front', 'Engineer')
Salary <- c(120, 340, 72, 400)
Master <- c('1', '2', '3', '4')
Bachelor <- c('6', '1', '3', '1')
University <- c('6', '2', '0', '2')
data1 <- data.frame(DevType, Salary, Master, Bachelor, University)
data1 <- gather(data1, key, value, -DevType, -Salary)
# use grouped bar chart instead
ggplot(data1, aes(x = DevType, y = Salary, fill = key)) +
geom_bar(position = "dodge", stat = "identity")
...and the output:
NOTE: as noted in the original post, salary levels by key variable are constant within each category of x-axis variable, so the chart is not particularly interesting.