rdatabasecount

How to count unique values a column in R


I have a database and would like to know how many people (identified by ID) match a characteristic. The list is like this:

111   A
109   A
112   A
111   A
108   A

I only need to count how many 'ID's have that feature, the problem is there duplicated ID's. I've tried with

count(df, vars = ID)

but it does not show the total number of ID's, just how many times they are repeated. Same with

count(df, c('ID')

as it shows the total number of ID's and many of them are duplicated, I need to count them one single time.

Do you have any suggestions? Using table function is not an option because of the size of this database.


Solution

  • We can use n_distinct() from dplyr to count the number of unique values for a column in a data frame.

    textFile <- "id var1
    111   A
    109   A
    112   A
    111   A
    108   A"
    
    df <- read.table(text = textFile,header = TRUE)
    
    library(dplyr)
    df %>% summarise(count = n_distinct(id))
    #   count
    # 1     4
    

    We can also summarise the counts within one or more by_group() columns.

    textFile <- "id var1
    111   A
    109   A
    112   A
    111   A
    108   A
    201   B
    202   B
    202   B
    111   B
    112   B
    109   B"
    
    df <- read.table(text = textFile,header = TRUE)
    
    df %>%  group_by(var1) %>% summarise(count = n_distinct(id))
    # A tibble: 2 x 2
    #   var1  count
    #   <chr> <int>
    # 1 A         4
    # 2 B         5