rggplot2facet-wrapgridextramultiple-axes

ggplot secondry y axis scale based on data with facet_wrap or grid_arrange


My data consists of 25 sectors on a time series, I want to plot for each sector the number of workers (series 1) and the average pay (series 2) in a line graph, with the secondary y axis for the average pay and the primary y axis for the number of workers, and than arrange the graphs on a grid.

example data:

period avg_wage number_of_workers sector
1990 2000 5000 construction
1991 2020 4970 construction
1992 2050 5050 construction
1990 1000 120 IT
1991 1100 400 IT
1992 1080 500 IT
1990 10000 900 hospital staff
1991 10200 980 hospital staff
1992 10400 1200 hospital staff

I tried to use facet_wrap() for the grid and scale_y_continuous(sec.axis...) as follows:

#fake sample data for reference
dfa=data.frame(order=seq(1,100),workers=rnorm(1000,7),pay=rnorm(1000,3000,500),type="a") #1st sector
dfb=data.frame(order=seq(1,100),workers=rnorm(1000,25),pay=rnorm(1000,1000,500),type="b") #2nd sector
dfc=data.frame(order=seq(1,100),workers=rnorm(1000,400),pay=rnorm(1000,5000,500),type="c") #3rd sector
df=rbind(dfa,dfb,dfc)
colnames(df)=c(
  "order", #shared x axis/time value
  "workers", #time series 1 (y values for left side y axis)
  "pay", #time series 2 (y values for left side y axis)
  "type" #diffrent graphs to put on the grid
)

ggploting the data:

df=df %>% group_by(l=type) %>% mutate(coeff=max(pay)/max(workers)) %>% ungroup() #creating a coefficient to scale the secondry axis
plot=ggplot(data=df,aes(x=order))+
  geom_line(aes(y=workers),linetype="dashed",color="red")+
  geom_line(aes(y=pay/coeff)) +
  scale_y_continuous(sec.axis=sec_axis(~.*coeff2,name="wage"))+
  facet_wrap(~type,scale="free")

But unfortunately this doesn't work since you cant use data in the function sec_axis() (this example doesn't even run).

another approach I tried is using a for loop and grid.arrange():

plots=list()
for (i in (unique(df$type)))
{
  singlesector=df[df$type==i,]
  axiscoeff=df$coeff[1]
  plot=ggplot(data=singlesector,aes(x=order))+
    geom_line(aes(y=workers),linetype="dashed",color="red")+
    geom_line(aes(y=pay/coeff)) + labs(title=i)+
    scale_y_continuous(sec.axis=sec_axis(~.*axiscoeff,name="wage"))
  plots[[i]]=plot
    
}
grid.arrange(grobs=plots)

But this also doesn't work because ggplot doesn't save the various values of the variable axiscoeff so it applies the first value to all of the graphs.

see result (the axis on the right are messed up and don't conform to the red line's data): the secondry axis doesn't make any sense for b and c

Is there any way to do what I want to do? I thought maybe saving directly all of the plots as png separately and than joining them in some other way but it just seems like an extreme solution which would take too much time figuring out.


Solution

  • As far as I get it, the issue is the way you (re)scale your data, i.e. using max(pay) / max(workers) you rescale your data such that the maximum value of pay is mapped on the maximum value of workers which however does not take account of the different range or the spread of the variables.

    Instead you could use scales::rescale to rescale your data such that the range of pay is mapped on the range of workers.

    Besides that I took a different approach to glue the plots together which makes use of patchwork. To this end I have put the plotting code in a function, split the data by type, use lapply to loop over the splitted data and finally glue the plots together using patchwork::wrap_plots.

    Note: As your example data included multiple values per order/type I slightly changed it to get rid of the zig-zag lines.

    library(dplyr)
    library(ggplot2)
    library(patchwork)
    library(scales)
    
    df %>% 
      split(.$type) %>% 
      lapply(function(df) {
        range_pay <- range(df$pay)
        range_workers <- range(df$workers)
        ggplot(data = df, aes(x = order)) +
          geom_line(aes(y = workers), linetype = "dashed", color = "red") +
          geom_line(aes(y = rescale(pay, range_workers, range_pay))) +
          scale_y_continuous(sec.axis = sec_axis(~ rescale(.x, range_pay, range_workers), name = "wage")) +
          facet_wrap(~type)
      }) %>% 
      wrap_plots(ncol = 1)
    

    DATA

    set.seed(123)
    dfa <- data.frame(order = 1:100, workers = rnorm(100, 7), pay = rnorm(100, 3000, 500), type = "a") # 1st sector
    dfb <- data.frame(order = 1:100, workers = rnorm(100, 25), pay = rnorm(100, 1000, 500), type = "b") # 2nd sector
    dfc <- data.frame(order = 1:100, workers = rnorm(100, 400), pay = rnorm(100, 5000, 500), type = "c") # 3rd sector
    df <- rbind(dfa, dfb, dfc)
    names(df) <- c("order", "workers", "pay", "type")