rggplot2geom-barstackedbarseriesgeom-segment

Connect stack bar charts with multiple groups with lines or segments using ggplot 2


I am conducting a study of a number of patients with a disease, and using an ordinal scale assessment of functional status at 3 different time points. I want to connect multiple groups in stacked bar charts across these time points.

I looked at these topics and havent gotten it to work using these suggestions:

How to position lines at the edges of stacked bar charts

Is there an efficient way to draw lines between different elements in a stacked bar plot using ggplot2?

Draw lines between different elements in a stacked bar plot

Please see the graphical representation of how I ultimately want this figure to look from R (generated in PRISM) of the frequencies of each of these 6 ordinal values across the three time points (top group has no patients with ordinal score 3,5,6):

Intended FIGURE using PRISM Intended FIGURE using PRISM

Data:

library(tidyverse)

mrs <-tibble(
  Score = c(0,1,2,3,4,5,6),
  pMRS = c(17,  2,   1,  0,  1,  0,   0),
  dMRS = c(2,  3,   2,  6,  4,  2,  2),
  fMRS = c(4,  4,  5,  4,  1,  1,  2)

And this is the code that ive tried so far before I run in to issues using either geom_line or geom_segment (left out thse lines because it just distorts the figure currently)

mrs <- mrs %>% mutate(across(-Score,~paste(round(prop.table(.) * 100, 2)))) %>%
   pivot_longer(cols = c("pMRS", "dMRS", "fMRS"), names_to = "timepoint") %>% 
   mutate(Score=as.character(Score),
          value=as.numeric(value)) %>% 
   mutate(timepoint = factor(timepoint, 
                             levels= c("fMRS", 
                              "dMRS",
                              "pMRS"))) %>% 
   mutate(Score = factor(Score,
                         levels = c("6","5","4","3","2","1","0")))
mrs %>% ggplot(aes(y= timepoint, x= value, fill= Score))+
  geom_bar(color= "black", width = 0.6, stat= "identity") +
  scale_fill_manual(name= NULL,
                    breaks = c("6","5","4","3","2","1","0"), values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_discrete(breaks=c("pMRS",
                            "dMRS",
                            "fMRS"),
                   labels=c("Pre-mRS,  (N=21)",
                            "Discharge mRS,  (N=21)",
                            "Followup mRS,  (N=21)"))+
  theme_classic()

Solution

  • I don't think there is an easy way of doing this, you'd have to (semi)-manually add these lines yourself. What I'm proposing below comes from this answer, but applied to your case. In essence, it exploits the fact that geom_area() is also stackable like the bar chart is. The downside is that you'll manually have to punch in coordinates for the positions where bars start and end, and you have to do it for each pair of stacked bars.

    library(tidyverse)
    
    # mrs <- tibble(...) %>% mutate(...) # omitted for brevity, same as question
    
    mrs %>% ggplot(aes(x= value, y= timepoint, fill= Score))+
      geom_bar(color= "black", width = 0.6, stat= "identity") +
      geom_area(
        # Last two stacked bars
        data = ~ subset(.x, timepoint %in% c("pMRS", "dMRS")),
        # These exact values depend on the 'width' of the bars
        aes(y = c("pMRS" = 2.7, "dMRS" = 2.3)[as.character(timepoint)]),
        position = "stack", outline.type = "both", 
        # Alpha set to 0 to hide the fill colour
        alpha = 0, colour = "black",
        orientation = "y"
      ) +
      geom_area(
        # First two stacked bars
        data = ~ subset(.x, timepoint %in% c("dMRS", "fMRS")),
        aes(y = c("dMRS" = 1.7, "fMRS" = 1.3)[as.character(timepoint)]),
        position = "stack", outline.type = "both", alpha = 0, colour = "black",
        orientation = "y"
      ) +
      scale_fill_manual(name= NULL,
                        breaks = c("6","5","4","3","2","1","0"),
                        values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
      scale_y_discrete(breaks=c("pMRS",
                                "dMRS",
                                "fMRS"),
                       labels=c("Pre-mRS,  (N=21)",
                                "Discharge mRS,  (N=21)",
                                "Followup mRS,  (N=21)"))+
      theme_classic()
    

    Arguably, making a separate data.frame for the lines is more straightforward, but also a bit messier.