rvectordplyrdata.tablemean

dplyr/data.table: How to calculate the mean for counts of observations per group for two vectors containing factors in R


Issue

I want to conduct a chi-square test on my data to explore if there are statistical differences between whistle types (A-F) amongst countries 'France' and 'Germany' (see the reproducible data frame below).

I would like to produce a table exactly the same as the Whistle Type Count table (see below) but instead of the counts of observations per whistle type, I would like to calculate the mean of each whistle type (A-F) per country using either the packages dplyr or data.table. Both the vectors Country and Whistle Type are factors.

For this analysis, how can I figure out how to calculate the mean for each whistle type per country?

Whistle Type Count Table

library(dplyr)

#Count the number of Whistle types per country
Count_Whistle.type <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country, sort=TRUE)

Counts of Observations per whistle type 

   Whistle_Type Country  n
1             F Germany 38
2             E  France 37
3             B  France 33
4             C Germany 33
5             B Germany 31
6             A  France 27
7             F  France 27
8             A Germany 25
9             D  France 23
10            C  France 21
11            D Germany 21
12            E Germany 19

What I need is something like this (these mean values are made up) showing the mean for each whistle type per country:

   Whistle_Type Country  Mean
1             F Germany 14.9
2             E  France 12.4
3             B  France  9.6

Reproducible R Code

#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1

#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)

#Rename the column f2
colnames(f2)<-"Country"

#How many rows
nrow(f2)

#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")

#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
                     'E', 'F'), 335, replace=TRUE)

#Create random numbers
Start.Freq<-runif(335, min=1.195110e+02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e+02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e+02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e+03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220+02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875+03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)

#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq,  Low.Freq, High.Freq, Peak.Freq,  Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)

#Rename the columns 
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq', 
                  'Delta.Freq', 'Delta.Time',"Whistle_Type")

#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
Whistle_Parameters

Solution

  • library(data.table)
    setDT(Whistle_Parameters)[, lapply(.SD, mean, na.rm=T), .(Country, Whistle_Type)]
    

    OR

    library(dplyr)
    Whistle_Parameters %>% 
      group_by(Country, Whistle_Type) %>% 
      summarize(across(everything(), mean))