i've searching for how to do this in R, but unfortunately didn't found an easy way to do it.
if i have a dataset called people like this
A | B |
---|---|
John | Student |
John | Student |
John | Student |
Sarah | Student |
Sarah | Student |
Mickey | Student |
i want to add a column in that dataset that reflect the number of times the values on A column are repeated. For example an output like this
A | B | C |
---|---|---|
John | Student | 3 |
John | Student | 3 |
John | Student | 3 |
Sarah | Student | 2 |
Sarah | Student | 2 |
Mickey | Student | 1 |
Please! any help are very welcome!
I've tried without success the following code
people <-aggregate(people$A, people, lengths)
the output of this, is a new column x on people but all values are 1 for each row
Data in reproducible format
people <- structure(list(A = c("John", "John", "John", "Sarah", "Sarah",
"Mickey"), B = c("Student", "Student", "Student", "Student",
"Student", "Student")), class = "data.frame", row.names = c(NA,
-6L))
Use ave
instead of aggregate
if you want the output vector to match the length of the grouping variable:
people$C <- as.numeric(ave(people$A, people$A, FUN = length))
people
#> A B C
#> 1 John Student 3
#> 2 John Student 3
#> 3 John Student 3
#> 4 Sarah Student 2
#> 5 Sarah Student 2
#> 6 Mickey Student 1
Or use run length encoding (rle
) and rep
people$C <- with(rle(people$A), rep(lengths, lengths))
people
#> A B C
#> 1 John Student 3
#> 2 John Student 3
#> 3 John Student 3
#> 4 Sarah Student 2
#> 5 Sarah Student 2
#> 6 Mickey Student 1
Or use n()
in dplyr
whilst using A
as the grouping variable.
library(dplyr)
people %>% mutate(C = n(), .by = A)
#> A B C
#> 1 John Student 3
#> 2 John Student 3
#> 3 John Student 3
#> 4 Sarah Student 2
#> 5 Sarah Student 2
#> 6 Mickey Student 1