Consider the following dataset:
fictional.df <- data.frame(L1 = c(0,0,0,0,0,0,0,0),
L2 = c(0,1,0,0,0,1,1,0),
L3 = c(1,1,0,1,1,1,1,1),
L4=c(0,0,1,1,0,0,0,0))
I converted this to a phyDat
object and then created a pairwise distance matrix as follows:
fictional.phydat <- as.phyDat(fictional.df,
type="USER",levels=c("1","0"),
names=names(fictional.df))
fictional.hamming <- dist.hamming(fictional.phydat)
From this distance matrix, I then estimated a UPGMA tree:
fictional.upgma <- upgma(fictional.hamming)
I then created bootstrap datasets:
set.seed(187)
fictional.upgma.bs <- bootstrap.phyDat(fictional.phydat, FUN =
function(xx) upgma(dist.hamming(xx)), bs=100)
I then calculated the proportion of partitions in the bootstrap set:
upgma.bs.part <- prop.part(fictional.upgma.bs)
So far so good. Here is where I would appreciate some help. When I call the function prop.clades
, I do not understand the result:
prop.clades(fictional.upgma,fictional.upgma.bs)
[1] 100 NA 71
Why does this function return NA
when there is evidence for that clade in the set of bootstrap trees?
A second question:
prop.clades(fictional.upgma,part=upgma.bs.part)
[1] 100 49 112
If there are only 100 bootstrap samples, why is the value for the final clade 112
?
Your tree fictional.upgma
is rooted and prop.clades
return as default how often each bipartition occurs. In a rooted tree the two edges leading to the root both refer to the same bipartition or split:
prop.clades(unroot(fictional.upgma), fictional.upgma.bs)
[1] 100 71
For rooted trees you some times want to count the number of identical clades:
prop.clades(fictional.upgma, fictional.upgma.bs, rooted=TRUE)
[1] 100 49 71
This seems a bug and you best report it to Emmanuel Pardis
prop.clades(fictional.upgma,part=upgma.bs.part)
[1] 100 49 112