I want to get a Sankey diagram in R with highcharter, with 3 different columns showing how people go from low to high measurements through 3 different years.
This is a mock table showing how I organised my table, as well as the code for the Sankey
library(highcharter)
dat <- cbind(c("1.Low", "1.Low","1.High", "1.High", "2.Low", "2.High", "2.High"),
c("2.Low", "2.High", "2.Low", "2.High", "3.High", "3.Low", "3.High"),
c(5, 10, 15, 5, 1, 10, 15))
dat<- as.data.frame(dat)
colnames(dat)<- c("from", "to", "weight")
dat$weight<- as.numeric(dat$weight)
hchart(dat, "sankey")
which gets me this Sankey diagram:
Sankey diagram with 3 columns of "high" and "low"
I want to do 3 things:
Change the labels to remove the numbers in front of the labels. The reason why I added them in was to differentiate between the different columns, or it would assume the diagram was only 2 columns (low and high), but i dont want that in my final diagram.
**Reorder ** the "high" and "low" in the last column.
Make all "highs" the same color, and all "lows" the same color - is that possible?
so far I've been trying to fiddle with the highcharter elements but I find the documentation very confusing and without useful examples for Sankey.
I've tried these fields, but it doesn't work. Any and all ideas appreciated.
hchart(dat, "sankey") %>%
hc_add_theme(hc_theme_ggplot2()) %>%
hc_plotOptions(series = list(dataLabels = list( style = list(fontSize = "10px")))) %>%
hc_plotOptions(sankey = list(colorByPoint = FALSE,
curveFactor = 0.5,
linkOpacity = 0.33)) %>%
hc_add_series(nodes= list(id = '1.High', color = "green"),
list(id = '1.Low', color = "blue"),
list(id = '2.High', color = "green"),
list(id = '2.Low', color = "blue"),
list(id = '3.High', color = "green"),
list(id = '3.Low', color = "blue"))
Here is one approach to achieve your desired result.
to
such that the "low"s come first, then by from
such that the "low"s come first.nodes=
attribute. However, instead of setting these manually for each node you could use lapply
to create the list
of individual node options.dat <- data.frame(
c("1.Low", "1.Low", "1.High", "1.High", "2.Low", "2.High", "2.High"),
c("2.Low", "2.High", "2.Low", "2.High", "3.High", "3.Low", "3.High"),
c(5, 10, 15, 5, 1, 10, 15)
)
colnames(dat) <- c("from", "to", "weight")
library(highcharter)
dat <- dat[order(
gsub("\\d+\\.\\s?", "", dat$to),
gsub("\\d+\\.\\s?", "", dat$from),
decreasing = TRUE
), ]
nodes <- unique(c(dat$from, dat$to)) |>
lapply(\(x) {
list(
id = x,
color = if (grepl("High", x)) "green" else "blue",
name = gsub("\\d+\\.\\s?", "", x)
)
})
highchart() %>%
hc_add_series(
data = dat, type = "sankey",
hcaes(from = from, to = to, weight = weight),
nodes = nodes
) |>
hc_plotOptions(
series = list(dataLabels = list(style = list(fontSize = "10px"))),
sankey = list(
curveFactor = 0.5,
linkOpacity = 0.33
)
)