I have created a Sankey image using ggplot
+ geom_sankey
packages. The visual comes out as expected, but the columns are not in the order that I would like to see.
I would like to arrange the middle column (out of 3 columns) in an order of my choosing and not have them in the the default order it automatically gives. I would also like to get rid of the NA in the middle column.
Here is an image of the result with the column I would like to order:
The order of the second column that I would like to see is the following (starting from the top): Bridal, Make-Up, Maternity, Shoes, Suits, Winter, Games, Vehicle.
I have found a post on how to do this: Rearrange the order of nodes in a sankey diagram using ggsankey. That solution calls to create levels of the factors in the order that I would like, like this: dflong$next_node <- factor(dflong$next_node,levels = c("C","B","A"))
.
However, that solution only works if the data has the same levels for node and next_node, and my data does not. Upon implementing that solution, it erased a lot of the sankey diagram.
How can I do this?
Reproducible Example
devtools::install_github("davidsjoberg/ggsankey")
library(ggsankey); library(ggplot2)
#Making a data frame
Years <- data.frame(Year = c(rep(2010, 5), rep(2011, 12), rep(2012, 2), rep(2013, 4), rep(2014, 5), rep(2015, 5), rep(NA, 3), rep(2022, 3), rep(NA, 2)),
Department = c(rep("Shoes", 4), rep("Bridal", 6), rep("Maternity", 10), rep("Winter", 3), rep("Make-Up", 6), rep("Suits", 3), rep("Vehicle", 1), rep("NA", 2), rep("Games", 4), rep(NA, 2)),
Description = c(rep("Place on feet", 4), rep("Wedding Dresses", 3), rep("Flowerly", 3), rep("Stretchy", 5), rep("Comfortable", 5), rep("Thick Socks", 3), rep("Foundation", 3), rep("Lipstick", 3), rep("Full Gear", 3), rep("Sedan", 1), rep("Electric", 2), rep("PC", 2), rep("Console", 2), rep(NA, 2)))
df_stack <- Years %>% make_long(Year, Department, Description) %>% filter(!is.na(node))
#graphing
ggplot(df_stack, aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node),
label = node,
color = factor(node))) +
geom_sankey(flow.alpha = 0.5, node.color = 1,
smooth = 6, width = 0.2,) + #width = width of nodes
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d(direction = -1, option = "turbo") +
scale_colour_viridis_d(direction = -1, option = "turbo") +
theme_sankey(base_size = 15) +
theme(legend.position = "none") + xlab('')
To get a specific order convert node
to a factor
with the order of the levels
set according to your desired order. In the code below I simply use the order as in the original dataset. Concerning your second issue: Besides true missing your data contains some character "NA"
which you have to account for when filtering your data.
library(ggsankey)
library(ggplot2)
library(dplyr, warn = FALSE)
levels <- Years |>
lapply(\(x) unique(x[!is.na(x)])) |>
Reduce(union, x = _)
df_stack <- Years %>%
make_long(Year, Department, Description) %>%
filter(node != "NA", !is.na(node)) |>
mutate(node = factor(node, levels))
ggplot(df_stack, aes(
x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = node,
label = node,
color = node
)) +
geom_sankey(
flow.alpha = 0.5, node.color = 1,
smooth = 6, width = 0.2,
) + # width = width of nodes
geom_sankey_label(size = 3.5, color = 1, fill = "white") +
scale_fill_viridis_d(direction = -1, option = "turbo") +
scale_colour_viridis_d(direction = -1, option = "turbo") +
theme_sankey(base_size = 15) +
theme(legend.position = "none") +
xlab(NULL)