I want to group by column a
and choose the most common factor b
for each unique a
. For example:
tibble(a = c(1,1,1,2,2,2), b = factor(c('cat', 'dog', 'cat', 'cat', 'dog', 'dog'))) %>%
reframe(b = most_common(b), .by = a)
I want this to produce:
a | b |
---|---|
1 | cat |
2 | dog |
However, the most_common
function doesn't exist. Is there an efficient R function for this purpose? This must be a pretty common need for data cleaning (what I need it for). I searched and found people implementing mode
functions. I could use one of those, but they seemed inefficient. Is there a better approach to this overall problem?
We can use table
+ max.col
d <- table(df)
data.frame(
a = as.numeric(row.names(d)),
b = colnames(d)[max.col(d)]
)
which gives
a b
1 1 cat
2 2 dog
or using dplyr
like below
group_by(a) %>%
summarise(b = names(which.max(table(b))))
which gives
# A tibble: 2 × 2
a b
<dbl> <chr>
1 1 cat
2 2 dog