geom_sankey in R: Ordering Columns

I have created a Sankey image using ggplot + geom_sankey packages. The visual comes out as expected, but the columns are not in the order that I would like to see.

I would like to arrange the middle column (out of 3 columns) in an order of my choosing and not have them in the the default order it automatically gives. I would also like to get rid of the NA in the middle column.

Here is an image of the result with the column I would like to order:

The order of the second column that I would like to see is the following (starting from the top): Bridal, Make-Up, Maternity, Shoes, Suits, Winter, Games, Vehicle.

I have found a post on how to do this: Rearrange the order of nodes in a sankey diagram using ggsankey. That solution calls to create levels of the factors in the order that I would like, like this: dflong$next_node <- factor(dflong$next_node,levels = c("C","B","A")).

However, that solution only works if the data has the same levels for node and next_node, and my data does not. Upon implementing that solution, it erased a lot of the sankey diagram.

How can I do this?

Reproducible Example

devtools::install_github("davidsjoberg/ggsankey")
library(ggsankey); library(ggplot2)

#Making a data frame

Years <- data.frame(Year = c(rep(2010, 5), rep(2011, 12), rep(2012, 2), rep(2013, 4), rep(2014, 5), rep(2015, 5), rep(NA, 3), rep(2022, 3), rep(NA, 2)),
                    Department = c(rep("Shoes", 4), rep("Bridal", 6), rep("Maternity", 10), rep("Winter", 3), rep("Make-Up", 6), rep("Suits", 3), rep("Vehicle", 1), rep("NA", 2), rep("Games", 4), rep(NA, 2)),
                    Description = c(rep("Place on feet", 4), rep("Wedding Dresses", 3), rep("Flowerly", 3), rep("Stretchy", 5), rep("Comfortable", 5), rep("Thick Socks", 3), rep("Foundation", 3), rep("Lipstick", 3), rep("Full Gear", 3), rep("Sedan", 1), rep("Electric", 2), rep("PC", 2), rep("Console", 2), rep(NA, 2)))
                                    
df_stack <- Years %>% make_long(Year, Department, Description) %>% filter(!is.na(node))

#graphing

ggplot(df_stack, aes(x = x, 
                    next_x = next_x,
                    node = node,
                    next_node = next_node, 
                    fill = factor(node), 
                    label = node,
                    color = factor(node))) + 
  geom_sankey(flow.alpha = 0.5, node.color = 1, 
              smooth = 6, width = 0.2,) +  #width = width of nodes
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d(direction = -1, option = "turbo") + 
  scale_colour_viridis_d(direction = -1, option = "turbo") +
  theme_sankey(base_size = 15) +
  theme(legend.position = "none") + xlab('')

Solution

To get a specific order convert node to a factor with the order of the levels set according to your desired order. In the code below I simply use the order as in the original dataset. Concerning your second issue: Besides true missing your data contains some character "NA" which you have to account for when filtering your data.

library(ggsankey)
library(ggplot2)
library(dplyr, warn = FALSE)

levels <- Years |> 
  lapply(\(x) unique(x[!is.na(x)])) |> 
  Reduce(union, x = _)

df_stack <- Years %>%
  make_long(Year, Department, Description) %>%
  filter(node != "NA", !is.na(node)) |> 
  mutate(node = factor(node, levels))

ggplot(df_stack, aes(
  x = x,
  next_x = next_x,
  node = node,
  next_node = next_node,
  fill = node,
  label = node,
  color = node
)) +
  geom_sankey(
    flow.alpha = 0.5, node.color = 1,
    smooth = 6, width = 0.2,
  ) + # width = width of nodes
  geom_sankey_label(size = 3.5, color = 1, fill = "white") +
  scale_fill_viridis_d(direction = -1, option = "turbo") +
  scale_colour_viridis_d(direction = -1, option = "turbo") +
  theme_sankey(base_size = 15) +
  theme(legend.position = "none") +
  xlab(NULL)