rbioinformaticsphyloseq

Phyloseq, how obtain the relative Abundance by merge_samples?


I'm trying to obtain the relative abundance using a merge_sample option of the Phyloseq package.

When I calculate the average of each Phylum (I will use GlobalPatterns as example) with all the samples; I mean, Globalpaters have 26 samples so I made something like

library(phyloseq)
library(plyr)
data(GlobalPatterns)
TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
OTUg <- otu_table(PGroup)
TAXg <- tax_table(PGroup)[,"Phylum"]
AverageD <- as.data.frame(rowMeans(OTUg))
names(AverageD) <- c("Mean")
GTable <- merge(TAXg, AverageD, by=0, all=TRUE)
GTable$Row.names = NULL
GTable <- GTable[order(desc(GTable$Mean)),]
head(GTable)

I get something like :

        Phylum           Mean

1 Proteobacteria      29.45550
2 Firmicutes          18.87905
3 Bacteroidetes       17.34374
4 Cyanobacteria       13.70639
5 Actinobacteria      8.93446
6....... More.....

What I think it is ok !!!!

However when I tray to mage merge_samples( by: SampleType):

    ps <- tax_glom(GlobalPatterns, "Phylum")
    ps0 <- transform_sample_counts(ps, function(x)100* x / sum(x))
    ps1 <- merge_samples(ps0, "SampleType")
    ps2 <- transform_sample_counts(ps1, function(x)100* x / sum(x))
    ps3 <- ps2
    otu_table(ps3) <- t(otu_table(ps3)) # transpose the matrix otus !!!
    OTUg <- otu_table(ps3)
    TAXg <- tax_table(ps3)[,"Phylum"]
    GTable <- merge(TAXg, OTUg, by=0, all=TRUE)
    GTable$Row.names = NULL
    GTable$Mean=rowMeans(GTable[,-c(1)], na.rm=TRUE)
    GTable <- GTable[order(desc(GTable$Mean)),]
   head(GTable)

I get the same tax but with different percent in the Mean column :

  Phylum Feces Freshwater Freshwater Mock Ocean Sediment Skin Soil Tongue Mean
1 Proteobacteria  1.58 16.71 18.61 20.10 38.00 71.03 31.98 32.66 44.49 30.57
2 Firmicutes 54.82 0.12 0.65 41.42 0.08 2.53 30.67 0.64 21.67 16.96
3 Bacteroidetes 35.23 11.92 5.07 24.97 31.17 7.01 9.09 9.90 12.28 16.29
4 Cyanobacteria 2.63 30.17 62.57 0.16 19.18 3.24 4.65 0.97 6.61 14.46
5 Actinobacteria 3.47 37.11 1.74 8.39 5.12 1.04 16.78 9.99 7.49 10.13

well at this point with the merge_samples by SampleType, each column (Sample) will glom the taxa and the percent for each Phylum in each sample will change (Feces Freshwater Freshwater... ), I understand that but the general average of each Phylum must be the same, even if I merge the samples, in this case, the Mean is different (Proteobacteria 30.57, Firmicutes 16.9, Bacteroidetes 16.29........).

Any solution or advice ????

Thanks


Solution

  • With the first part, you are taking the means across all samples. In the second, your are taking the mean of grouped means. These two are only equivalent when the number of observations per group is the same.

    For example:

    # equal n for each group
    abundance = seq(0.1,0.6,by=0.1)
    group = rep(letters[1:3],each=2)
    mean(tapply(abundance,group,mean)) == mean(abundance)
    [1] TRUE
    
    # unequal n
    abundance = seq(0.1,0.6,by=0.1)
    group = rep(letters[1:3],1:3)
    mean(tapply(abundance,group,mean)) == mean(abundance)
    [1] FALSE
    

    Your n per SampleType is different

    TGroup <- tax_glom(GlobalPatterns, taxrank = "Phylum")
    PGroup <- transform_sample_counts(TGroup, function(x)100* x / sum(x))
    SampleType = sample_data(PGroup)$SampleType
    table(SampleType)
    
    SampleType
                 Feces         Freshwater Freshwater (creek)               Mock 
                     4                  2                  3                  3 
                 Ocean Sediment (estuary)               Skin               Soil 
                     3                  3                  3                  3 
                Tongue 
                     2 
    

    To get the same mean abundance across samples, you need to find the average abundance per SampleType, then average:

    mean_PGroup = sapply(levels(SampleType),function(i){
      rowMeans(otu_table(PGroup)[,SampleType==i])
    })
    
    phy = tax_table(PGroup)[rownames(mean_PGroup ),"Phylum"]
    rownames(mean_PGroup) = phy
    head(sort(rowMeans(mean_PGroup),decreasing=TRUE))
    
     Proteobacteria      Firmicutes   Bacteroidetes   Cyanobacteria  Actinobacteria 
          30.572773       16.956254       16.293286       14.463643       10.126875 
    Verrucomicrobia 
           2.774216