rsankey-diagramnetworkd3

Create group and change node and links color in an interactive Sankey with the networkD3


How can I create a group for nodes and links and color them accordingly on Sankey plots using networkD3 in R? this excellent example shows the steps on data formatting. Here is the code and plot from the example there, I want to add color by groups in this plot.

df <- read.table(header = TRUE, stringsAsFactors = FALSE, text = '
name  year1           year2         year3           year4
Bob   Hilton          Sheraton      Westin          Hyatt
John  "Four Seasons"  Ritz-Carlton  Westin          Sheraton
Tom   Ritz-Carlton    Westin        Sheraton        Hyatt
Mary  Westin          Sheraton      "Four Seasons"  Ritz-Carlton
Sue   Hyatt           Ritz-Carlton  Hilton          Sheraton
Barb  Hilton          Sheraton      Ritz-Carlton    "Four Seasons"
')

Format dataframe and create Sankey plot

links <-
  df %>%
  mutate(row = row_number()) %>%  # add a row id
  pivot_longer(-row, names_to = "column", values_to = "source") %>%  # gather all columns
  mutate(column = match(column, names(df))) %>%  # convert col names to col ids
  group_by(row) %>%
  mutate(target = lead(source, order_by = column)) %>%  # get target from following node in row
  ungroup() %>% 
  filter(!is.na(target))  # remove links from last column in original data
links <-
  links %>%
  mutate(source = paste0(source, '_', column)) %>%
  mutate(target = paste0(target, '_', column + 1)) %>%
  select(source, target)
nodes <- data.frame(name = unique(c(links$source, links$target)))
nodes$label <- sub('_[0-9]*$', '', nodes$name) # remove column id from node label
links$source_id <- match(links$source, nodes$name) - 1
links$target_id <- match(links$target, nodes$name) - 1
links$value <- 1

library(networkD3)

sankeyNetwork(Links = links, Nodes = nodes, Source = 'source_id',
              Target = 'target_id', Value = 'value', NodeID = 'label')

How to create a group and change node and links colors by the group in the about Sankey plot ?


Solution

  • Add a column to the links and to the nodes data frames that specify the group of each row, then specify those columns with the LinkGroup and NodeGroup (respectively) arguments of sankeyNetwork().

    For example, using your code/data above, I add a link_group column to the links data frame and a node_group column to the nodes data frame, and then specify them in the call to sankeyNetwork()...

    links <- links %>% mutate(link_group = sub(".*_", "", source))
    nodes <- nodes %>% mutate(node_group = sub(".*_", "", name))
    
    sankeyNetwork(Links = links, Nodes = nodes, Source = 'source_id',
                  Target = 'target_id', Value = 'value', NodeID = 'label',
                  LinkGroup = "link_group", NodeGroup = "node_group")
    

    enter image description here