rggplot2ggalluvial

Group color/ flow so that the bars create a bar chart of first


I have a dataset that looks something like this:

results <-  as.data.frame(cbind(c("Violence", "Violence", "Violence", "Violence", "Economic", "Economic","Economic","Economic","Institutional","Institutional","Institutional","Institutional"), 
                                c("No", "No", "Yes", "Yes","No", "No", "Yes", "Yes", "No", "No", "Yes", "Yes"),
                                c("Yes", "No", "Yes", "No","Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No"), 
                                c(3,3,1,3,4,5,8,7,6,5,4,3)))
colnames(results) <- c("Type", "Test1", "Test2", "Freq")

Then I create an alluvial plot with ggalluvial

 library(ggplot2)
  library(tidyverse)
  library(ggalluvial)

ggplot(data = results,
       aes(axis1 = Type, axis2 = Test1, axis3 = Test2,
           y = Freq)) +
  scale_x_discrete(limits = c("Article", "False 0s Removed", "New Flow Measure"), expand = c(.2, .05)) +
  xlab("Results") +
  geom_flow(aes(fill = Type)) +
  geom_stratum() +
  geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
  theme_minimal() +
  ggtitle("Replication Summary")

enter image description here

This looks fine except for each stratum I want the vertical order to be organized by the color (Type), so that each stratum is a bar chart of sorts where I can see what percentage of each Type are No and Yes for each test. How would I change so the vertical ordering is grouped by color ($type) at each stratum (Test1 and Test2). At current the second stratum (Test 1) looks good but the the third does not (test 2)


Solution

  • If I understood it correctly, the only thing you have to add is aes.bind = 'flow' in geom_flow().

    results <-  data.frame(Type = c("Violence", "Violence", "Violence", "Violence", "Economic", "Economic","Economic","Economic","Institutional","Institutional","Institutional","Institutional"), 
                           Test1 = c("No", "No", "Yes", "Yes","No", "No", "Yes", "Yes", "No", "No", "Yes", "Yes"),
                           Test2 = c("Yes", "No", "Yes", "No","Yes", "No", "Yes", "No", "Yes", "No", "Yes", "No"), 
                           Freq = c(3,3,1,3,4,5,8,7,6,5,4,3)
                           )
    
    
    library(ggplot2)
    library(tidyverse)
    library(ggalluvial)
    
    ggplot(data = results,
           aes(axis1 = Type, axis2 = Test1, axis3 = Test2,
               y = Freq)) +
      scale_x_discrete(limits = c("Article", "False 0s Removed", "New Flow Measure"), expand = c(.2, .05)) +
      xlab("Results") +
      geom_flow(aes(fill = Type), aes.bind = 'flow') +
      geom_stratum() +
      geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
      theme_minimal() +
      ggtitle("Replication Summary")
    

    Created on 2023-03-07 by the reprex package (v2.0.1)

    EDIT: I am not quite sure if it is possible to get the colors in the stratas with geom_flow(), but you can do it with geom_alluvial(). For this I changed the way the example data was generated, because in your example Freq was not numeric and geom_alluvial() threw an error. Now you can add the fill-argument to geom_stratum. If one stratum cannot filled by a single Type its color will be NA. If you add scale_fill_discrete(na.value = NA) these strata will become transparent and you can see the colors.

    ggplot(data = results,
           aes(axis1 = Type, axis2 = Test1, axis3 = Test2,
               y = Freq)) +
      scale_x_discrete(limits = c("Article", "False 0s Removed", "New Flow Measure"), expand = c(.2, .05)) +
      xlab("Results") +
      geom_alluvium(aes(fill = Type), aes.bind = "alluvia") +
      geom_stratum(aes(fill = Type)) +
      scale_fill_discrete(na.value = NA) +
      geom_text(stat = "stratum", aes(label = after_stat(stratum))) +
      theme_minimal() +
      ggtitle("Replication Summary")
    

    Created on 2023-03-07 by the reprex package (v2.0.1)