How can I create a group for nodes and links and color them accordingly on Sankey plots using networkD3 in R? this excellent example shows the steps on data formatting. Here is the code and plot from the example there, I want to add color by groups in this plot.
df <- read.table(header = TRUE, stringsAsFactors = FALSE, text = '
name year1 year2 year3 year4
Bob Hilton Sheraton Westin Hyatt
John "Four Seasons" Ritz-Carlton Westin Sheraton
Tom Ritz-Carlton Westin Sheraton Hyatt
Mary Westin Sheraton "Four Seasons" Ritz-Carlton
Sue Hyatt Ritz-Carlton Hilton Sheraton
Barb Hilton Sheraton Ritz-Carlton "Four Seasons"
')
Format dataframe and create Sankey plot
links <-
df %>%
mutate(row = row_number()) %>% # add a row id
pivot_longer(-row, names_to = "column", values_to = "source") %>% # gather all columns
mutate(column = match(column, names(df))) %>% # convert col names to col ids
group_by(row) %>%
mutate(target = lead(source, order_by = column)) %>% # get target from following node in row
ungroup() %>%
filter(!is.na(target)) # remove links from last column in original data
links <-
links %>%
mutate(source = paste0(source, '_', column)) %>%
mutate(target = paste0(target, '_', column + 1)) %>%
select(source, target)
nodes <- data.frame(name = unique(c(links$source, links$target)))
nodes$label <- sub('_[0-9]*$', '', nodes$name) # remove column id from node label
links$source_id <- match(links$source, nodes$name) - 1
links$target_id <- match(links$target, nodes$name) - 1
links$value <- 1
library(networkD3)
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source_id',
Target = 'target_id', Value = 'value', NodeID = 'label')
Add a column to the links
and to the nodes
data frames that specify the group of each row, then specify those columns with the LinkGroup
and NodeGroup
(respectively) arguments of sankeyNetwork()
.
For example, using your code/data above, I add a link_group
column to the links
data frame and a node_group
column to the nodes
data frame, and then specify them in the call to sankeyNetwork()
...
links <- links %>% mutate(link_group = sub(".*_", "", source))
nodes <- nodes %>% mutate(node_group = sub(".*_", "", name))
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source_id',
Target = 'target_id', Value = 'value', NodeID = 'label',
LinkGroup = "link_group", NodeGroup = "node_group")