I think what I am trying to do is simple but I cannot quite crack it. I have a column "Random_ID" with a participant code. In total I have 16 different participants and they are distributed across four schools, which are represented by a School_ID number. My objective is to calculate the mean of participants per school. In this case, I know it will be four, but the objective would be to extrapolate to considerably higher numbers of participants and Schools. Any help would be much appreciated.
(df <- data.frame(Random_ID = c("A1", "A2", "A3", "A4", "A5", "A6","A7", "A8", "A9", "10", "A11", "A12", "A13", "A14", "A15", "A16"), School_ID = c("1", "2", "3", "4", "1", "2", "3", "4",
"1", "2", "3", "4", "1", "2", "3", "4")))
#> Random_ID School_ID
#> 1 A1 1
#> 2 A2 2
#> 3 A3 3
#> 4 A4 4
#> 5 A5 1
#> 6 A6 2
#> 7 A7 3
#> 8 A8 4
#> 9 A9 1
#> 10 10 2
#> 11 A11 3
#> 12 A12 4
#> 13 A13 1
#> 14 A14 2
#> 15 A15 3
#> 16 A16 4
Created on 2023-03-01 with reprex v2.0.2
With dplyr
you can count the number per school and then take the mean of those counts
library(dplyr)
df %>%
count(School_ID) %>%
summarize(mean(n))
That leaves the value in a data.frame. If you wanted to pull it out to a numerical vector you could do
df %>%
count(School_ID) %>%
pull(n) %>%
mean()
With base R you can use table
for the counting and then take the mean
mean(table(df$School_ID))