[SOLVED] How to group_by(x) and summarise by counting distinct(y) for each x level?

How to group_by(x) and summarise by counting distinct(y) for each x level?

I have the following situation:

V1	V2
A	A1
A	A1
A	A1
A	A2
A	A2
A	A3
B	B1
B	B2
B	B2

and i need to group by V1, and summarise counting how many distinct groups each V1 level has in V2. Something like this:

V1	n
A	3
B	2

How can i use dplyr funcitons to solve that?

Thanks!!

Solution

We can use rle after grouping by 'V1'

library(dplyr)
df1 %>%
   group_by(V1) %>%
   summarise(n = length(rle(V2)$values), .groups = 'drop')

-output

# A tibble: 2 × 2
  V1        n
  <chr> <int>
1 A         3
2 B         2

Or with rleid and n_distinct

library(data.table)
df1 %>% 
  group_by(V1) %>% 
  summarise(n = n_distinct(rleid(V2)))
# A tibble: 2 × 2
  V1        n
  <chr> <int>
1 A         3
2 B         2

data

df1 <- structure(list(V1 = c("A", "A", "A", "A", "A", "A", "B", "B", 
"B"), V2 = c("A1", "A1", "A1", "A2", "A2", "A1", "B1", "B2", 
"B2")), class = "data.frame", row.names = c(NA, -9L))