rggplot2ggalluvial

Alluvial diagram with varying aesthetic / colors over one flow in R based on ggplot2


I have created the following alluvial diagram in R as follows:

df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
                 Pearson1 = c(6, 3, 2, 5, 4, 1),
                 Spearman1 = c(6, 5, 1, 2, 3, 4),
                 Kendall1 = c(6, 5, 1, 2, 3, 4),
                 Pearson2 = c(6, 5, 1, 2, 3, 4),
                 Spearman2 = c(6, 5, 1, 2, 4, 3),
                 Kendall2 = c(6, 5, 1, 2, 3, 4))
df$freq<-1
alluvial(df[1:7], freq=df$freq, cex = 0.7,col= "red")

which results in enter image description here

How can I set some specific lines to have different col than red? e.g. X1 from Variables to Pearson1, and then again from Kendall1 to Spearman2 and X3 in all states? I see I can't do that based on alluvial(). How can I recreate the above alluvial based on another function??


Solution

  • ggalluvial allows for varying aesthetics over one "flow" (or alluvium). The documentation provides a trick to use geom_flow with stat = "alluvium" and to specify "lode.guidance = "frontback".

    The actual aesthetic (color) will need to be added to the data. geom_flow and geom_stratum will require different columns for the aesthetic, (try what happens when you use the same for both). I am passing the color directly and using scale_identity, but you can of course also use random values and then define your colors with scale_manual.

    library(ggalluvial)
    #> Loading required package: ggplot2
    library(tidyverse)
    df <- data.frame(Variable = c("X1", "X2", "X3", "X4", "X5", "X6"),
                     Pearson1 = c(6, 3, 2, 5, 4, 1),
                     Spearman1 = c(6, 5, 1, 2, 3, 4),
                     Kendall1 = c(6, 5, 1, 2, 3, 4),
                     Pearson2 = c(6, 5, 1, 2, 3, 4),
                     Spearman2 = c(6, 5, 1, 2, 4, 3),
                     Kendall2 = c(6, 5, 1, 2, 3, 4))
    
    df_long <-
      df %>%
      ## reshape your data in order to bring it to the right shape
      mutate(across(everything(), as.character)) %>%
      rownames_to_column("ID") %>%
      pivot_longer(-ID) %>%
      ## correct order of your x
      mutate(
        name = factor(name, levels = names(df)),
        ## now hard code where you want to change the color.
        ## lodes need a different highlighting then your strata
        ## there are of course many ways to add this information, I am using case_when here
        ## you could also create separate vectors and add them to your data frame
        highlight_lode = case_when(
          ID == 3 ~ "blue",
          ID == 1 & name %in% c("Variable", "Kendall1", "Pearson2") ~ "orange",
          TRUE ~ "red"
        ),
        highlight_stratum = case_when(
          ID == 3 ~ "blue",
          ID == 1 & name %in% c(
            "Variable", "Pearson1", "Kendall1", "Pearson2",
            "Spearman2"
          ) ~ "orange",
          TRUE ~ "red"
        )
      )
    
    ggplot(df_long, 
           ## now use different color aesthetics in geom_flow and geom_stratum
           aes(x = name, stratum = value, alluvium = ID, label = value)) +
      ## I took this trick with lode guidance from the documentation - this allows varying aesthetics over one flow. 
      geom_flow(aes(fill = highlight_lode), stat = "alluvium", lode.guidance = "frontback", color = "darkgray") +
      geom_stratum(aes(fill = highlight_stratum)) +
      geom_text(stat = "stratum") +
      ## as I have named the colors directly, it is appropriate to use scale_identity
      scale_fill_identity()
    #> Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
    #> ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
    

    Created on 2023-01-29 with reprex v2.0.2