I have a sample dataset:
df <- data.frame(category = c("A", "A", "B", "C", "C", "D", "E", "C", "E", "A", "B", "C", "B", "B", "B", "D", "D", "D", "D", "B"), year = c(1, 2, 1, 2, 3, 2, 3, 1, 3, 2, 1, 1, 2, 1, 2, 3, 1, 2, 3, 1))
and would like to create a cross-tabulation of year
and category
such that only the 3 most frequent categories are in the table and also sorted by total number of occurences:
1 2 3
B 4 2 0
D 1 2 2
C 2 1 1
Using something like
df %>%
add_count(category) %>%
filter(n %in% tail(sort(unique(n)),3)) %>%
arrange(desc(n)) %>% {table(.$category, .$year)}
will filter for the three most occurring categories but leave the table unsorted
1 2 3
B 4 2 0
C 2 1 1
D 1 2 2
This should give you what you want.
# Make a table
df.t <- table(df)
# Order by top occurrences (sum over margin 1)
df.t <- df.t[order(apply(df.t, 1, sum), decreasing=TRUE),]
# Keep top 3 results
df.t <- df.t[1:3,]
Output:
year
category 1 2 3
B 4 2 0
D 1 2 2
C 2 1 1