Issue
I want to conduct a chi-square test on my data to explore if there are statistical differences between whistle types (A-F)
amongst countries 'France' and 'Germany'
(see the reproducible data frame below).
I would like to produce a table exactly the same as the Whistle Type Count table
(see below) but instead of the counts of observations per whistle type, I would like to calculate the mean of each whistle type (A-F) per country using either the packages dplyr
or data.table
. Both the vectors Country
and Whistle Type
are factors.
For this analysis, how can I figure out how to calculate the mean for each whistle type per country?
Whistle Type Count Table
library(dplyr)
#Count the number of Whistle types per country
Count_Whistle.type <- Whistle_Parameters %>% dplyr::count(Whistle_Type, Country, sort=TRUE)
Counts of Observations per whistle type
Whistle_Type Country n
1 F Germany 38
2 E France 37
3 B France 33
4 C Germany 33
5 B Germany 31
6 A France 27
7 F France 27
8 A Germany 25
9 D France 23
10 C France 21
11 D Germany 21
12 E Germany 19
What I need is something like this (these mean values are made up) showing the mean for each whistle type per country:
Whistle_Type Country Mean
1 F Germany 14.9
2 E France 12.4
3 B France 9.6
Reproducible R Code
#Dummy data
#Create a cluster column with dummy data (clusters = 3)
f1 <- gl(n = 2, k=167.5); f1
#Produce a data frame for the dummy level data
f2<-as.data.frame(f1)
#Rename the column f2
colnames(f2)<-"Country"
#How many rows
nrow(f2)
#Rename the levels of the dependent variable 'Country' as classifiers
#prefer the inputs to be factors
levels(f2$Country) <- c("France", "Germany")
#Add a vector called Whistle Types
#Add a vector called Behaviors
Whistle_Types<-sample(c('A', 'B', 'C', 'D',
'E', 'F'), 335, replace=TRUE)
#Create random numbers
Start.Freq<-runif(335, min=1.195110e+02, max=23306.000000)
End.Freq<-runif(335, min=3.750000e+02, max=65310.000000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
Low.Freq<-runif(335, min=6.592500e+02, max=20491.803000)
High.Freq<-runif(335, min=2.051000e+03, max=36388.450000)
Peak.Freq<-runif(335, min=7.324220+02, max=35595.703000)
Center.Freq<-runif(335, min=2.190000e-02, max=3.155800)
Delta.Freq<-runif(335, min=1.171875+03, max=30761.719000)
Delta.Time<-runif(335, min=2.192504e-02, max=3.155762)
#Bind the columns together
Bind<-cbind(f2, Start.Freq, End.Freq, Low.Freq, High.Freq, Peak.Freq, Center.Freq, Delta.Freq, Delta.Time, Whistle_Types)
#Rename the columns
colnames(Bind)<-c('Country', 'Low.Freq', 'High.Freq', 'Start.Freq', 'End.Freq', 'Peak.Freq', 'Center.Freq',
'Delta.Freq', 'Delta.Time',"Whistle_Type")
#Produce a dataframe
Whistle_Parameters<-as.data.frame(Bind)
Whistle_Parameters
library(data.table)
setDT(Whistle_Parameters)[, lapply(.SD, mean, na.rm=T), .(Country, Whistle_Type)]
OR
library(dplyr)
Whistle_Parameters %>%
group_by(Country, Whistle_Type) %>%
summarize(across(everything(), mean))