My data consists of 25 sectors on a time series, I want to plot for each sector the number of workers (series 1) and the average pay (series 2) in a line graph, with the secondary y axis for the average pay and the primary y axis for the number of workers, and than arrange the graphs on a grid.
example data:
period | avg_wage | number_of_workers | sector |
---|---|---|---|
1990 | 2000 | 5000 | construction |
1991 | 2020 | 4970 | construction |
1992 | 2050 | 5050 | construction |
1990 | 1000 | 120 | IT |
1991 | 1100 | 400 | IT |
1992 | 1080 | 500 | IT |
1990 | 10000 | 900 | hospital staff |
1991 | 10200 | 980 | hospital staff |
1992 | 10400 | 1200 | hospital staff |
I tried to use facet_wrap() for the grid and scale_y_continuous(sec.axis...) as follows:
#fake sample data for reference
dfa=data.frame(order=seq(1,100),workers=rnorm(1000,7),pay=rnorm(1000,3000,500),type="a") #1st sector
dfb=data.frame(order=seq(1,100),workers=rnorm(1000,25),pay=rnorm(1000,1000,500),type="b") #2nd sector
dfc=data.frame(order=seq(1,100),workers=rnorm(1000,400),pay=rnorm(1000,5000,500),type="c") #3rd sector
df=rbind(dfa,dfb,dfc)
colnames(df)=c(
"order", #shared x axis/time value
"workers", #time series 1 (y values for left side y axis)
"pay", #time series 2 (y values for left side y axis)
"type" #diffrent graphs to put on the grid
)
ggploting the data:
df=df %>% group_by(l=type) %>% mutate(coeff=max(pay)/max(workers)) %>% ungroup() #creating a coefficient to scale the secondry axis
plot=ggplot(data=df,aes(x=order))+
geom_line(aes(y=workers),linetype="dashed",color="red")+
geom_line(aes(y=pay/coeff)) +
scale_y_continuous(sec.axis=sec_axis(~.*coeff2,name="wage"))+
facet_wrap(~type,scale="free")
But unfortunately this doesn't work since you cant use data in the function sec_axis() (this example doesn't even run).
another approach I tried is using a for loop and grid.arrange():
plots=list()
for (i in (unique(df$type)))
{
singlesector=df[df$type==i,]
axiscoeff=df$coeff[1]
plot=ggplot(data=singlesector,aes(x=order))+
geom_line(aes(y=workers),linetype="dashed",color="red")+
geom_line(aes(y=pay/coeff)) + labs(title=i)+
scale_y_continuous(sec.axis=sec_axis(~.*axiscoeff,name="wage"))
plots[[i]]=plot
}
grid.arrange(grobs=plots)
But this also doesn't work because ggplot doesn't save the various values of the variable axiscoeff so it applies the first value to all of the graphs.
see result (the axis on the right are messed up and don't conform to the red line's data):
Is there any way to do what I want to do? I thought maybe saving directly all of the plots as png separately and than joining them in some other way but it just seems like an extreme solution which would take too much time figuring out.
As far as I get it, the issue is the way you (re)scale your data, i.e. using max(pay) / max(workers)
you rescale your data such that the maximum value of pay
is mapped on the maximum value of workers
which however does not take account of the different range or the spread of the variables.
Instead you could use scales::rescale
to rescale your data such that the range of pay
is mapped on the range of workers
.
Besides that I took a different approach to glue the plots together which makes use of patchwork
. To this end I have put the plotting code in a function, split
the data by type
, use lapply
to loop over the splitted data and finally glue the plots together using patchwork::wrap_plots
.
Note: As your example data included multiple values per order/type I slightly changed it to get rid of the zig-zag lines.
library(dplyr)
library(ggplot2)
library(patchwork)
library(scales)
df %>%
split(.$type) %>%
lapply(function(df) {
range_pay <- range(df$pay)
range_workers <- range(df$workers)
ggplot(data = df, aes(x = order)) +
geom_line(aes(y = workers), linetype = "dashed", color = "red") +
geom_line(aes(y = rescale(pay, range_workers, range_pay))) +
scale_y_continuous(sec.axis = sec_axis(~ rescale(.x, range_pay, range_workers), name = "wage")) +
facet_wrap(~type)
}) %>%
wrap_plots(ncol = 1)
DATA
set.seed(123)
dfa <- data.frame(order = 1:100, workers = rnorm(100, 7), pay = rnorm(100, 3000, 500), type = "a") # 1st sector
dfb <- data.frame(order = 1:100, workers = rnorm(100, 25), pay = rnorm(100, 1000, 500), type = "b") # 2nd sector
dfc <- data.frame(order = 1:100, workers = rnorm(100, 400), pay = rnorm(100, 5000, 500), type = "c") # 3rd sector
df <- rbind(dfa, dfb, dfc)
names(df) <- c("order", "workers", "pay", "type")