rweighted-averagesummarizationreorganize

Calculating weighted average according to various variable classes in R


I have size data of different species. Each sample represents a plot of reef of 1 m^2 (a quadrat; "Unique"). There should be 5 quadrats at each site in any given year ("YrSi") and any number of species at a quadrat (some have more then others and they often differ). I need to calculate a mean of the "Size" based on the weighting of the "Count" (ie weighted mean) for each "YrSi"(year-site combo) and "Taxa" (ie species). Example:

head(df)
         Unique   Yr Si   Qd      YrSi                   Taxa Count Size SLength
6  2007-M1-1991 2007 M1 1991 2007 - M1 Carpophyllum flexuosum     7   10        
7  2007-M1-1991 2007 M1 1991 2007 - M1 Carpophyllum flexuosum     1   15        
8  2007-M1-1991 2007 M1 1991 2007 - M1 Carpophyllum flexuosum     5   20        
9  2007-M1-1991 2007 M1 1991 2007 - M1 Carpophyllum flexuosum     4   25        
10 2007-M1-1991 2007 M1 1991 2007 - M1 Carpophyllum flexuosum     4   30        
11 2007-M1-1991 2007 M1 1991 2007 - M1 Carpophyllum flexuosum     1   35  

I tried using weighted.mean embedded within ddply. but the calculation is wrong and I got the same value for all species in all YrSi. I suspect it applied the weighted.mean calculation across all species and samples.

wt_mean.df = ddply(df, c("YrSi","Taxa"),
 function(x) weighted.mean(df$Size, df$Count))
head(wt_mean.df)
       YrSi                        Taxa       V1
1 2007 - C1           Buccinulum lineum 21.22346
2 2007 - C1       Cantharidus purpureus 21.22346
3 2007 - C1 Carpophyllum maschalocarpum 21.22346
4 2007 - C1           Cominella virgata 21.22346
5 2007 - C1              Cookia sulcata 21.22346
6 2007 - C1            Ecklonia radiata 21.22346
head(wt_mean.df)
       YrSi                        Taxa       V1
1 2007 - C1           Buccinulum lineum 21.22346
2 2007 - C1       Cantharidus purpureus 21.22346
3 2007 - C1 Carpophyllum maschalocarpum 21.22346
4 2007 - C1           Cominella virgata 21.22346
5 2007 - C1              Cookia sulcata 21.22346
6 2007 - C1            Ecklonia radiata 21.22346

The calculation is wrong and I got the same value for all species in all YrSi. I suspect it applied the weighted.mean calculation across all species and samples.

tail(wt_mean.df)
          YrSi                 Taxa       V1
1603 2019 - T5   Maoricolpus roseus 21.22346
1604 2019 - T5 Patiriella regularis 21.22346
1605 2019 - T5  Sargassum scabridum 21.22346
1606 2019 - T5 Sargassum sinclairii 21.22346
1607 2019 - T5      Trochus viridus 21.22346
1608 2019 - T5   Zonaria turneriana 21.22346

What am I doing wrong? why don't I get the correct weighted means in V1? Also, it would be good to also get a weighted sd, but I haven't looked into it yet. Please help.


Solution

  • Dplyr might be an easy solution for you.

    library(dplyr)
    output <- df%>%
      group_by(Yr, Si, Taxa) %>%
      summarise(wMean = weighted.mean(Size, Count))