rsummarization

Return most frequent string value for each group


a <- c(rep(1:2,3))
b <- c("A","A","B","B","B","B")
df <- data.frame(a,b)

> str(b)
chr [1:6] "A" "A" "B" "B" "B" "B"

  a b
1 1 A
2 2 A
3 1 B
4 2 B
5 1 B
6 2 B

I want to group by variable a and return the most frequent value of b

My desired result would look like

  a b
1 1 B
2 2 B

In dplyr it would be something like

df %>% group_by(a) %>% summarize (b = most.frequent(b))

I mentioned dplyr only to visualize the problem.


Solution

  • The key is to start grouping by both a and b to compute the frequencies and then take only the most frequent per group of a, for example like this:

    df %>% 
      count(a, b) %>%
      slice(which.max(n))
    
    Source: local data frame [2 x 3]
    Groups: a
    
      a b n
    1 1 B 2
    2 2 B 2
    

    Of course there are other approaches, so this is only one possible "key".