rggplot2chartsgraph-visualizationsankey-diagram

Beautifying and sorting some variables in the Sankey/Alluvial diagram using R


I'm trying to improve my skills in data visualization, and I got almost what I wanted. But at some point, I got stuck and couldn't go any further. Just to be aware, guys, I've done extensive research here to try to find my doubts, it helps me a lot.

Here is my data set:

https://app.box.com/s/pp5p5chgypn6ba33anotie7wlxvdu01v

Here is my code:

library(tidyverse)
library(ggalluvial)
library(alluvial)

A_col <- "firebrick3"
B_col <- "darkorange"
C_col <- "aquamarine2"
D_col <- "dodgerblue2"
E_col <- "darkviolet"
F_col <- "chartreuse2"
G_col <- "goldenrod1"
H_col <- "gray73"
set.seed(39)

ggplot(df,
       aes(y = Time, axis1 = Activity, axis2 = Category, axis3 = Positions)) +
  geom_alluvium(aes(fill = Positions, color = Positions), 
        width = 4/12, alpha = 0.5, knot.pos = 0.3) +
  geom_stratum(width = 4/12, color = "grey36") +
  geom_text(stat = "stratum", label.strata = TRUE) +
  scale_x_continuous(breaks = 1:3, 
       labels = c("Activity", "Category", "Positions/Movements"), expand = c(.01, .05)) +
  ylab("Time 24 hours") +
  scale_fill_manual(values  = c(A_col, B_col, C_col, D_col, E_col, F_col, G_col, H_col)) +
  scale_color_manual(values = c(A_col, B_col, C_col, D_col, E_col, F_col, G_col, H_col)) +
  ggtitle("Physical Activity during the week and weekend") +
  theme_minimal() +
  theme(legend.position = "none", panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), axis.text.y = element_blank(), 
        axis.text.x = element_text(size = 12, face = "bold"))

# I also have this code that I run without pre-choosing the colours.
# I like this one because the flow diagram doesn't have any border.

ggplot(df,
       aes(y = Time, axis1 = Activity, axis2 = Category, axis3 = Positions)) +
  scale_x_discrete(limits = c("Activity", "Category", "Positions/Moviments"), 
       expand = c(.01, .05)) +
  ylab("Time 24 hours") +
  geom_alluvium(aes(fill = Positions), width = 4/12, alpha = 0.5, knot.pos = 0.3) +
  geom_stratum() + geom_text(stat = "stratum", label.strata = TRUE) +
  theme_minimal() +
  ggtitle("Physical Activity during the week and weekend") +
  theme(legend.position = "none", panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), axis.text.y = element_blank(), 
        axis.text.x = element_text(size = 12, face = "bold"))

Here is the visualization: enter image description here

There are three things I really couldn't do:

  1. Sort the Category with a clear view of the week and after the weekend, such as Working, Non Working, Sleep Week, Leisure and Sleep Weekend.

  2. Sort the Positions/Movements such as Sitting, Lying, Standing, Moving, Stairs, Walk Slow, Walk Fast and Running. Also, I would like to fill the squares of this column with the same colour of the flow diagram. Another thing is that some names don't have enough space, I don't know if it's possible to reset the space to accommodate them, or maybe put them outside with an arrow indicating the square that belongs to them. Almost forgot, is there any way to manually assign the colours to each variable, such as colour black for Walk Slow? Plus, if it's possible I would like to take out the lines from the edges of the flow diagram.

  3. Is there a way to stack the names Position and Movements?

Any way to improve this visualization and make it beautiful?

Thanks in advance, Luiz


Solution

  • here's a solution that fixes some of your problems.

    df <- read_csv('Desktop/plot_alluvial_category_position_plus_moviments.csv')
    positions <- c("Sitting", "Lying", "Standing", "Moving", "Stairs", "Walk Slow",
                   "Walk Fast", "Running")
    df$Positions <- factor(df$Positions, levels = positions, labels = positions)
    category <- c("Working", "Non Working", "Sleep Week", "Leisure", 
                  "Sleep Weekend")
    df$Category <- factor(df$Category, levels = category, labels = category)
    
    ggplot(df,
           aes(y = Time, axis1 = Activity, axis2 = Category, axis3 = Positions)) +
      geom_alluvium(aes(fill = Positions), 
                    width = 4/12, alpha = 0.5, knot.pos = 0.3) +
      geom_stratum(width = 4/12, color = "grey36") +
      geom_text(stat = "stratum", label.strata = TRUE, min.height=100) +
      scale_x_continuous(breaks = 1:3, 
                         labels = c("Activity", "Category", "Positions\nMovements"), expand = c(.01, .05)) +
      ylab("Time 24 hours") +
      scale_fill_manual(values  = c(A_col, B_col, C_col, D_col, E_col, F_col, G_col, H_col)) +
      scale_color_manual(values = c(A_col, B_col, C_col, D_col, E_col, F_col, G_col, H_col)) +
      ggtitle("Physical activity during the week and weekend") +
      theme_minimal() +
      theme(legend.position = "none", panel.grid.major = element_blank(), 
            panel.grid.minor = element_blank(), axis.text.y = element_blank(), 
            axis.text.x = element_text(size = 12, face = "bold"))
    
    1. To sort your strata, you need to transform your Category and Position columns to factors where you set the order of the levels.
    2. To remove the edges of the flow diagram, it's enough to remove color = Position from your aes level.
    3. You can stack the names Position and Movement by adding a newline character in the label.
    4. You can assign the colors to strata, but only if the categories are the same throughout (check some examples in the ggalluvial documentation).
    5. To avoid the overlap in the small strata, you can use min.height argument in geom_text that was introduced in ggalluvial version 0.9.2, as shown here.