I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.
myDf
name day score
joe monday 16
joe monday 16
harry wednesday 88
harry thursday 55
james tuesday 10
will monday 10
harry wednesday 88
joe tuesday 16
joe monday 16
Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;
mydf_new
name day score
joe monday 16 (a1)
joe monday 16 (a2)
harry wednesday 88 (a1)
harry thursday 55
james tuesday 10
will monday 10
harry wednesday 88 (a2)
joe tuesday 16
joe monday 16 (a3)
The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE))
but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.
I have created a new column (identifier
) for the output so that the original column score
is not changed.
We can group the data by name
and day
, if the number of rows in a group is more than 1 we create a sequence a1
, a2
and paste it with score
value.
library(dplyr)
mydf |>
mutate(identifier = if(n() > 1) paste0(score, " (a", row_number(), ")")
else as.character(score), .by = c(name, day))
# name day score identifier
#1 joe monday 16 16 (a1)
#2 joe monday 16 16 (a2)
#3 harry wednesday 88 88 (a1)
#4 harry thursday 55 55
#5 james tuesday 10 10
#6 will monday 10 10
#7 harry wednesday 88 88 (a2)
#8 joe tuesday 16 16
#9 joe monday 16 16 (a3)
and the same logic translated in base R :
transform(mydf, identifier = ave(score, name, day, FUN = \(x)
if(length(x) > 1) paste0(x, " (a", seq_along(x), ")") else x))