rbioinformaticsphylogenyggtreeape

Can you color branch lines in a tree based on node data? (ggtree)


Say I have a dataframe that dennotes node colors for a given tree (3 nodes for clarity but would like to expand this to 1000 nodes)

note_df
nodes color
node1 #0d3b66
node2 #faf0ca
node3 #f4d35e

And then I have a tree with three nodes:

library(ape)
library(ggplot2)
library(ggtree)
treex="((bat:10, cow:10):10,(elk:10, fox:10):10);"
treep=ape::read.tree(text=treex)
plot(treep)

Is there a way to use ggtree to color the lines based on the node dataframe? I am showing the result of plot(treep) but I have manually colored the lines for the desired output. Any help would be appreciated! (requires ape and ggtree and ggplot2 pacakges).

enter image description here

I know you can do this manually with ggtree but I have 1000 nodes and this would take forever.. Any help would be appreciated.


Solution

  • Coloring the branches is described here. (1) The issue here is that the nodes from your image are not in the correct place. node2 is actually node5, node1 is really node 7 ... this makes it difficult because they need to be remapped to what you described in your image. (2) the colors in your node df do not match the ones from your image:

    out

    library(ape)
    library(ggplot2)
    library(ggtree)
    library(tidyverse)
    
    x <- ape::read.tree(text = "((bat:10, cow:10):10,(elk:10, fox:10):10);")
    
    nc <- data.frame( node_id = 1:3, node_color = c("#0d3b66", "#faf0ca", "#f4d35e"))
    
    tree_data <- as_tibble(ggtree::fortify(x)) %>% left_join(nc, by = c("node" = "node_id"))
    
    ggtree(x) %<+% tree_data + 
      geom_tree(aes(color = I(node_color))) +
      geom_label(aes(label=node)) +
      geom_tiplab() 
    

    Another way of mapping colors to branches

    Instead of nodes use the label in tree_data to map your colors to. This data table is shown below and used for building the tree and determining the node's color. In R you can use any of these rows/columns to join your desired color to. In below example I use the label. The main issue is how your data.frame which determines the colors is joined to these node-ids. But since we have so much info about parent-nodes/ids/isTip?, it should be possible to build something from this answer.

    tree_data

    parent node branch.length label isTip x y branch angle node_color
    6 1 10 bat TRUE 20 1.0 15 90 #0d3b66
    6 2 10 cow TRUE 20 2.0 15 180 #faf0ca
    7 3 10 elk TRUE 20 3.0 15 270 #f4d35e

    Mapping with labels

    x <- ape::read.tree(text = "((bat:10, cow:10):10,(elk:10, fox:10):10);")
    
    nc <- data.frame( label = c("fox","elk","cow","bat"), node_color = c("lightgreen", "lightgreen", "orange","orange"))
    
    tree_data <- as_tibble(ggtree::fortify(x)) %>% left_join(nc, by = c("label" = "label"))
    tree_data$node_color[is.na(tree_data$node_color)] <- "blue"
    
    ggtree(x) %<+% tree_data + 
      geom_tree(aes(color = I(node_color))) +
      geom_label(aes(label=node)) +
      geom_tiplab() 
    

    out