I am using dplyr on R and I am trying to filter a tibble which contains transactional data.
The columns of my interest are "Country" and "Sales".
I have a lot of countries and for exploration purposes I want to analyze only the TOP 5 countries with most sales.
The trouble here is that if I do some grouping, it will not work for me, as I need all the rows for further analysis purposes (transactional data).
I tried something like:
trans_merch_df %>% group_by(COUNTRY) %>% top_n(n = 5, wt = NET_SLS_AMT)
But it's completely off.
Let's say I have this:
trans_merch_df <- tibble::tribble(~COUNTRY, ~SALE,
'POR', 14,
'POR', 1,
'DEU', 4,
'DEU', 6,
'POL', 8,
'ITA', 1,
'ITA', 1,
'ITA', 1,
'SPA', 1,
'NOR', 50,
'NOR', 10,
'SWE', 42,
'SWE', 1)
The result I am expecting is:
COUNTRY SALE
POR 14
POR 1
DEU 4
DEU 6
POL 8
NOR 50
NOR 10
SWE 42
SWE 1
As ITA and SPA are not in the TOP 5 of sales.
Thanks a lot in advance.
Cheers!
A different dplyr
possibility could be:
df %>%
add_count(COUNTRY, wt = SALE) %>%
mutate(n = dense_rank(desc(n))) %>%
filter(n %in% 1:5) %>%
select(-n)
COUNTRY SALE
<chr> <int>
1 POR 14
2 POR 1
3 DEU 4
4 DEU 6
5 POL 8
6 NOR 50
7 NOR 10
8 SWE 42
9 SWE 1
Or even more concise:
df %>%
add_count(COUNTRY, wt = SALE) %>%
filter(dense_rank(desc(n)) %in% 1:5) %>%
select(-n)