rplotlysankey-diagram

Sankey Diagram in R & plotly: unnexpected connections


I am trying to build Sankey plot with three layers in R using plotly package (plotly_4.10.2). Although connections from source to target seems reasonable from "links" data, plot itself displays connections incorrectly.

For example, "example.data" -> Gene3-Treatment-Catogory2 is displayed as Gene3-Treatment-Category1, Connections for Gene8 is wrong as well. Should I do any rearrangement of labels before plotting?

Screenshot of the plot

enter image description here

library(plotly)

# this is an example data

example.data <- data.frame(
  genes = c("Gene1", "Gene2", "Gene3", "Gene4", "Gene5", "Gene6", "Gene7", "Gene8", "Gene9"),
  conditions = c("Control", "Control", "Treatment", "Treatment", "Treatment", "Treatment", "Treatment", "Treatment", "Treatment"),
  category = c("Category1", "Category1", "Category2", "Category2", "Category2", "Category2", "Category2", "Category1", "Category2")
)

nodes <- data.frame(name = unique(c(as.character(example.data$genes),
                                    as.character(example.data$conditions),
                                    as.character(example.data$category))))

links <- data.frame(source = match(example.data$genes, nodes$name) - 1,
                    target = match(example.data$conditions, nodes$name) - 1,
                    stringsAsFactors = FALSE)

links <- rbind(links,
               data.frame(source = match(example.data$conditions, nodes$name) - 1,
                          target = match(example.data$category, nodes$name) - 1,
                          stringsAsFactors = FALSE))


plotly::plot_ly(
  type = "sankey",
  domain = list(x =  c(0,1),
                y =  c(0,1)),
  orientation = "h",
  customdata = nodes$name,
  node = list(
    label = nodes$name,
    pad = 15,
    thickness = 15,
    line = list(color = "black",
                width = 0.5)),
  link = list(source = links$source,
              target = links$target,
              value =   rep(1, nrow(links))
  ))

Solution

  • Maybe try to plot in this order: condition -> genes -> category:

    nodes <- unique(unlist(example.data))
    
    links <- list(
      source = c(match(example.data$conditions, nodes) - 1, 
                 match(example.data$genes, nodes) - 1),
      target = c(match(example.data$genes, nodes) - 1,
                 match(example.data$category, nodes) - 1),
      value = rep(1, nrow(example.data) * 2))
    
    plot_ly(type = "sankey",
            node = list(label = nodes),
            link = links)
    

    enter image description here