rdplyrtidyrforcats

How to obtain the mean of the count across different group levels


I think what I am trying to do is simple but I cannot quite crack it. I have a column "Random_ID" with a participant code. In total I have 16 different participants and they are distributed across four schools, which are represented by a School_ID number. My objective is to calculate the mean of participants per school. In this case, I know it will be four, but the objective would be to extrapolate to considerably higher numbers of participants and Schools. Any help would be much appreciated.

(df <- data.frame(Random_ID = c("A1", "A2", "A3", "A4", "A5", "A6","A7", "A8", "A9", "10", "A11", "A12", "A13", "A14", "A15", "A16"), School_ID = c("1", "2", "3", "4", "1", "2", "3", "4", 
                                                                                                                                                    "1", "2", "3", "4", "1", "2", "3", "4")))
#>    Random_ID School_ID
#> 1         A1         1
#> 2         A2         2
#> 3         A3         3
#> 4         A4         4
#> 5         A5         1
#> 6         A6         2
#> 7         A7         3
#> 8         A8         4
#> 9         A9         1
#> 10        10         2
#> 11       A11         3
#> 12       A12         4
#> 13       A13         1
#> 14       A14         2
#> 15       A15         3
#> 16       A16         4

Created on 2023-03-01 with reprex v2.0.2


Solution

  • With dplyr you can count the number per school and then take the mean of those counts

    library(dplyr)
    df %>% 
      count(School_ID) %>% 
      summarize(mean(n))
    

    That leaves the value in a data.frame. If you wanted to pull it out to a numerical vector you could do

    df %>% 
      count(School_ID) %>% 
      pull(n) %>% 
      mean()
    

    With base R you can use table for the counting and then take the mean

    mean(table(df$School_ID))