rphylogenyphyloseq

Phylo tree has no branch lengths using phylosig () R


I am having a hard time understanding why my phylo tree (imported as a phyloseq object from QIIME2) appears to have no branch lengths when used in phylosig(). I am trying to compute the phylogenetic signal of my 16S dataset compared to a single continuous metadata variable. All example datasets are included at the bottom of this question.

    glacialpath <- metadata[,c(7)]
    
    phylosig(phylo, glacialpath, method = "K", test = TRUE)

Error in vcv.phylo(tree) : the tree has no branch lengths
    In addition: Warning message:
    In drop.tip(tree, setdiff(tree$tip.label, names(x))) :
      drop all tips of the tree: returning NULL

My tree phylo was separated from a complete phyloseq object phylo <- phy_tree(physeq = physeq) and has confirmed branch lengths:

Phylogenetic tree with 1290 tips and 1289 internal nodes.

Tip labels:
  e54924083c02bd088c69537d02406eb8, 3112899fc7a2adb7f74a081a82c7cde4, db5d745b02355b6fed513c4953b62326, 2faf2046aa9ab2f6598df79cd52e9c7b, bec8db81cc8ec453136c82ede8327a9f, 601e47b8adcbd21d159f74421710e1b5, ...
Node labels:
  0.942, 0.563, 0.711, 0.999, 0.000, 0.528, ...

Rooted; includes branch lengths.

A dput() of phylo is too large to include, so hopefully what I have provided is enough. Any help is appreciated.

> dput (metadata)
structure(list(sample = structure(1:48, .Label = c("sample-10", 
"sample-11", "sample-12", "sample-14", "sample-16", "sample-18", 
"sample-19", "sample-20", "sample-21", "sample-22", "sample-23", 
"sample-24", "sample-25", "sample-26", "sample-27", "sample-28", 
"sample-29", "sample-30", "sample-31", "sample-32", "sample-33", 
"sample-34", "sample-35", "sample-36", "sample-37", "sample-40", 
"sample-41", "sample-43", "sample-44", "sample-45", "sample-46", 
"sample-50", "sample-55", "sample-56", "sample-57", "sample-58", 
"sample-59", "sample-61", "sample-62", "sample-63", "sample-64", 
"sample-65", "sample-67", "sample-68", "sample-69", "sample-70", 
"sample-71", "sample-8"), class = "factor"), pond = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 1L), .Label = c("Lilly", 
"RHM1", "RHM2", "SS", "TS"), class = "factor"), meh = structure(c(16L, 
17L, 19L, 4L, 1L, 16L, 18L, 20L, 3L, 5L, 7L, 10L, 11L, 22L, 1L, 
2L, 14L, 16L, 18L, 20L, 3L, 5L, 7L, 10L, 1L, 15L, 16L, 18L, 19L, 
20L, 21L, 8L, 1L, 2L, 14L, 15L, 16L, 18L, 19L, 20L, 21L, 3L, 
5L, 6L, 9L, 12L, 13L, 1L), .Label = c("0-1", "1-2.", "10-11.", 
"11-12.", "12-13.", "13-14.5", "14-15", "14-16.", "14.5-17.5", 
"16-17", "17-19", "17.5-18.5", "18.5-20", "2-3.", "3-4.", "4-5.", 
"5-6.", "6-7.", "7-8.", "8-9.", "9-10.", "9-11."), class = "factor"), 
    depth = c(4, 5, 7, 11, 0, 4, 6, 8, 10, 12, 14, 16, 17, 9, 
    0, 1, 2, 4, 6, 8, 10, 12, 14, 16, 0, 3, 4, 6, 7, 8, 9, 14, 
    0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 12, 13, 14, 19, 18, 0), om = structure(c(29L, 
    13L, 9L, 9L, 27L, 27L, 26L, 26L, 23L, 23L, 12L, 12L, 20L, 
    34L, 24L, 24L, 22L, 22L, 25L, 25L, 21L, 21L, 11L, 11L, 8L, 
    7L, 5L, 3L, 3L, 2L, 1L, 4L, 33L, 30L, 31L, 32L, 28L, 18L, 
    19L, 17L, 15L, 17L, 16L, 14L, 18L, 19L, 10L, 6L), .Label = c("1.2", 
    "1.3", "1.4", "1.5", "1.7", "19.9", "2.1", "2.6", "2.8", 
    "2.9", "27.9", "29.8", "3.2", "3.3", "3.4", "3.5", "3.6", 
    "3.7", "3.8", "30.2", "30.6", "32.1", "32.5", "32.6", "32.9", 
    "35.7", "36.2", "4.2", "4.3", "5.4", "5.8", "6", "6.4", "na"
    ), class = "factor"), water = structure(c(19L, 16L, 12L, 
    13L, 28L, 28L, 35L, 35L, 30L, 30L, 31L, 31L, 28L, 36L, 27L, 
    27L, 33L, 33L, 34L, 34L, 31L, 31L, 29L, 29L, 20L, 21L, 15L, 
    5L, 3L, 2L, 1L, 4L, 26L, 23L, 24L, 25L, 22L, 13L, 14L, 7L, 
    8L, 9L, 11L, 10L, 17L, 18L, 6L, 32L), .Label = c("16.1", 
    "17", "17.5", "18.8", "21.6", "21.8", "22.3", "22.4", "22.6", 
    "22.7", "23.1", "23.6", "24.4", "24.5", "24.6", "25.5", "26.9", 
    "28", "29.1", "31.8", "32.8", "34.9", "40.5", "42.9", "43.2", 
    "50.9", "59", "60.1", "61.4", "61.5", "62.5", "62.9", "63.2", 
    "63.8", "66.4", "na"), class = "factor"), glacialpath = c(19.4, 
    19.4, 19.4, 19.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 
    6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 6.4, 
    18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 18.7, 16.8, 16.8, 
    16.8, 16.8, 16.8, 16.8, 16.8, 16.8, 16.8, 16.8, 16.8, 16.8, 
    16.8, 16.8, 16.8, 19.4), elevation = c(8.2, 8.2, 8.2, 8.2, 
    18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 
    18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 18.3, 
    13.4, 13.4, 13.4, 13.4, 13.4, 13.4, 13.4, 13.4, 5.5, 5.5, 
    5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 5.5, 
    5.5, 8.2), area = c(20536L, 20536L, 20536L, 20536L, 3571L, 
    3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 
    3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 3571L, 
    3571L, 10369L, 10369L, 10369L, 10369L, 10369L, 10369L, 10369L, 
    10369L, 57003L, 57003L, 57003L, 57003L, 57003L, 57003L, 57003L, 
    57003L, 57003L, 57003L, 57003L, 57003L, 57003L, 57003L, 57003L, 
    20536L), tectonics = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("outwash", 
    "uplift"), class = "factor"), lat = c(60.52, 60.52, 60.52, 
    60.52, 60.478, 60.478, 60.478, 60.478, 60.478, 60.478, 60.478, 
    60.478, 60.478, 60.478, 60.478, 60.478, 60.478, 60.478, 60.478, 
    60.478, 60.478, 60.478, 60.478, 60.478, 60.491, 60.491, 60.491, 
    60.491, 60.491, 60.491, 60.491, 60.491, 60.425, 60.425, 60.425, 
    60.425, 60.425, 60.425, 60.425, 60.425, 60.425, 60.425, 60.425, 
    60.425, 60.425, 60.425, 60.425, 60.52), long = c(145.601, 
    145.601, 145.601, 145.601, 145.382, 145.382, 145.382, 145.382, 
    145.382, 145.382, 145.382, 145.382, 145.382, 145.382, 145.382, 
    145.382, 145.382, 145.382, 145.382, 145.382, 145.382, 145.382, 
    145.382, 145.382, 145.547, 145.547, 145.547, 145.547, 145.547, 
    145.547, 145.547, 145.547, 145.473, 145.473, 145.473, 145.473, 
    145.473, 145.473, 145.473, 145.473, 145.473, 145.473, 145.473, 
    145.473, 145.473, 145.473, 145.473, 145.601), x.depth = c(1L, 
    1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
    4L, 1L), y.depth = c(2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 
    19L, 20L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 
    5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 1L)), class = "data.frame", row.names = c(NA, 
-48L))

Solution

  • The problem is not the lack of branch lengths in the tree. What the error message is saying is that the tree does not have the tips that correspond to the names of values in the trait variable. glacialpath must be a named vector and the names must be present in the phylogeny.

    In general, a good practice is to check for possible problems in these three areas.

    1. phylo is not a correct phylo format.
    2. phylo should contain tips with names corresponding to those in the metadata.
    3. Related to #2, glacialpath lacks names.

    phylo is not a correct phylo format

    phytools::phylosig requires that the tree is in correct phylo format. Try to explore the tree with str(phylo), and see whether all values in phylo$edge.length are numeric and there are no missing values.

    phylo should contain tips with names corresponding to those in the metadata

    What are the samples the phylogenetic signal should be calculated for? Assuming that the column sample in the metadata contains names, reduce the tree to the size of the available data:

    tr2 <- ape::keep.tip(phy = phylo, tip = metadata$sample)
    plot(tr2) # to check that the tree was trimmed correctly
    

    While phytools::phylosig can trim the tree during processing, preparing a subset of the tree beforehand is preferable, because the user can check that the trimmed tree plots as expected, including branch lengths. In the given example, the tree tips are not names present in the trait vector names.

    glacialpath lacks names

    The glacialpath is not a named vector. To set the names of a vector, use:

    glacialpath <- setNames(object = metadata[, 7], nm = metadata$sample)
    

    Lastly, a good practice is to name variables differently to names of functions or, in this case, classes.