I have a database and would like to know how many people (identified by ID) match a characteristic. The list is like this:
111 A
109 A
112 A
111 A
108 A
I only need to count how many 'ID's have that feature, the problem is there duplicated ID's. I've tried with
count(df, vars = ID)
but it does not show the total number of ID's, just how many times they are repeated. Same with
count(df, c('ID')
as it shows the total number of ID's and many of them are duplicated, I need to count them one single time.
Do you have any suggestions? Using table function is not an option because of the size of this database.
We can use n_distinct() from dplyr to count the number of unique values for a column in a data frame.
textFile <- "id var1
111 A
109 A
112 A
111 A
108 A"
df <- read.table(text = textFile,header = TRUE)
library(dplyr)
df %>% summarise(count = n_distinct(id))
# count
# 1 4
We can also summarise the counts within one or more by_group() columns.
textFile <- "id var1
111 A
109 A
112 A
111 A
108 A
201 B
202 B
202 B
111 B
112 B
109 B"
df <- read.table(text = textFile,header = TRUE)
df %>% group_by(var1) %>% summarise(count = n_distinct(id))
# A tibble: 2 x 2
# var1 count
# <chr> <int>
# 1 A 4
# 2 B 5