[SOLVED] How to obtain the mean of the count across different group levels

How to obtain the mean of the count across different group levels

I think what I am trying to do is simple but I cannot quite crack it. I have a column "Random_ID" with a participant code. In total I have 16 different participants and they are distributed across four schools, which are represented by a School_ID number. My objective is to calculate the mean of participants per school. In this case, I know it will be four, but the objective would be to extrapolate to considerably higher numbers of participants and Schools. Any help would be much appreciated.

(df <- data.frame(Random_ID = c("A1", "A2", "A3", "A4", "A5", "A6","A7", "A8", "A9", "10", "A11", "A12", "A13", "A14", "A15", "A16"), School_ID = c("1", "2", "3", "4", "1", "2", "3", "4", 
                                                                                                                                                    "1", "2", "3", "4", "1", "2", "3", "4")))
#>    Random_ID School_ID
#> 1         A1         1
#> 2         A2         2
#> 3         A3         3
#> 4         A4         4
#> 5         A5         1
#> 6         A6         2
#> 7         A7         3
#> 8         A8         4
#> 9         A9         1
#> 10        10         2
#> 11       A11         3
#> 12       A12         4
#> 13       A13         1
#> 14       A14         2
#> 15       A15         3
#> 16       A16         4

^{Created on 2023-03-01 with reprex v2.0.2}

Solution

With dplyr you can count the number per school and then take the mean of those counts

library(dplyr)
df %>% 
  count(School_ID) %>% 
  summarize(mean(n))

That leaves the value in a data.frame. If you wanted to pull it out to a numerical vector you could do

df %>% 
  count(School_ID) %>% 
  pull(n) %>% 
  mean()

With base R you can use table for the counting and then take the mean

mean(table(df$School_ID))