rdatasetaggregate

R - add column to dataset with number of times that a row value is repeated


i've searching for how to do this in R, but unfortunately didn't found an easy way to do it.

if i have a dataset called people like this

A B
John Student
John Student
John Student
Sarah Student
Sarah Student
Mickey Student

i want to add a column in that dataset that reflect the number of times the values on A column are repeated. For example an output like this

A B C
John Student 3
John Student 3
John Student 3
Sarah Student 2
Sarah Student 2
Mickey Student 1

Please! any help are very welcome!

I've tried without success the following code

people <-aggregate(people$A, people, lengths)

the output of this, is a new column x on people but all values are 1 for each row


Data in reproducible format

people <- structure(list(A = c("John", "John", "John", "Sarah", "Sarah", 
"Mickey"), B = c("Student", "Student", "Student", "Student", 
"Student", "Student")), class = "data.frame", row.names = c(NA, 
-6L))

Solution

  • Use ave instead of aggregate if you want the output vector to match the length of the grouping variable:

    people$C <- as.numeric(ave(people$A, people$A, FUN = length))
    
    people
    #>        A       B C
    #> 1   John Student 3
    #> 2   John Student 3
    #> 3   John Student 3
    #> 4  Sarah Student 2
    #> 5  Sarah Student 2
    #> 6 Mickey Student 1
    

    Or use run length encoding (rle) and rep

    people$C <- with(rle(people$A), rep(lengths, lengths))
    
    people
    #>        A       B C
    #> 1   John Student 3
    #> 2   John Student 3
    #> 3   John Student 3
    #> 4  Sarah Student 2
    #> 5  Sarah Student 2
    #> 6 Mickey Student 1
    

    Or use n() in dplyr whilst using A as the grouping variable.

    library(dplyr)
    
    people %>% mutate(C = n(), .by = A)
    #>        A       B C
    #> 1   John Student 3
    #> 2   John Student 3
    #> 3   John Student 3
    #> 4  Sarah Student 2
    #> 5  Sarah Student 2
    #> 6 Mickey Student 1