rggraphtidygraph

Colouring by variable when using tidy graph in R?


I am trying to come up with a way to consistently colour multiple tidygraph plots. Right now, the issue is, when I plot multiple plots to the screen at once, tidygraph chooses a different colour for each variable. hopefully my example below will explain the issue.

To begin, I create some data, turn them into tidygraph objects, and put them together into a list:

library(tidygraph)
library(ggraph)
library(gridExtra)

# create some data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
                    label = c("x4", 5, 2))

nodes1 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
                    label = c("x4", "x2", 2,   1, "x1", 2, 7))

edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
                    to    = c(2, 3, 4, 5, 6, 7))

# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges1)


# put into list
myList <- list(tg, tg_1)

Then I have a plotting function that allows me to display all the plots at once. I do this using grid.arrange from the gridExtra package, like so:

plotFun <- function(List){
ggraph(List, "partition") +
  geom_node_tile(aes(fill = name), size = 0.25) +
  geom_node_label(aes(label = label, color = name)) +
  scale_y_reverse() +
  theme_void() +
  theme(legend.position = "none")
}

# Display all plots
allPlots <- lapply(myList, plotFun)
n <- length(allPlots)
nRow <- floor(sqrt(n))
do.call("grid.arrange", c(allPlots, nrow = nRow))

This will produce something like this: example plot

As you can see, it colours by the variable label for each individual plot. This results in the same variable label being coloured differently in each plot. For example, x4 in the first plot is red and in the second plot is blue.

I'm trying to find a way to make the colours for the variable's label consistent across all plots. Maybe using grid.arrange isn't the best solution!?

Any help is appreciated.


Solution

  • Since each plot doesn't know anything about the other plots, it's best to assign colors yourself. First you can extract all the node names and assign them a color

    nodenames <- unique(na.omit(unlist(lapply(myList, .%>%activate(nodes) %>% pull(name) ))))
    nodecolors <- setNames(scales::hue_pal(c(0,360)+15, 100, 64, 0, 1)(length(nodenames)), nodenames)
    nodecolors 
    #        x4        x2        x1 
    # "#F5736A" "#00B734" "#5E99FF"
    

    We use scales::hue_pal to get the "default" ggplot colors but you could use whatever you like. Then we just need to customize the color/fill scales for the plots with these colors.

    plotFun <- function(List, colors=NULL){
      plot <- ggraph(List, "partition") +
        geom_node_tile(aes(fill = name), size = 0.25) +
        geom_node_label(aes(label = label, color = name)) +
        scale_y_reverse() +
        theme_void() +
        theme(legend.position = "none")
        if (!is.null(colors)) {
          plot <- plot + scale_fill_manual(values=colors) + 
            scale_color_manual(values=colors, na.value="grey")
        }
      plot
    }
    allPlots <- lapply(myList, plotFun, colors=nodecolors)
    n <- length(allPlots)
    nRow <- floor(sqrt(n))
    do.call("grid.arrange", c(allPlots, nrow = nRow))
    

    plot with coordinating node name colors