r

Looping through a data frame whilst grouping rows


I wish to loop through my data frame and change the value of one of the cells to avoid duplicates.

myDf
name     day        score
joe      monday     16
joe      monday     16
harry    wednesday  88
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88
joe      tuesday    16
joe      monday     16

Here I have duplicates rows so to make them unique and keep the data I wish to paste attempt number after score like so;

mydf_new
name      day        score
joe      monday     16 (a1)
joe      monday     16 (a2)
harry    wednesday  88 (a1)
harry    thursday   55
james    tuesday    10
will     monday     10
harry    wednesday  88 (a2)
joe      tuesday    16
joe      monday     16 (a3)

The main issue I'm having is how to loop through my data and group all rows that are not unique. I have managed to identify them using dupSax = sax = which(duplicated(c(paste(myDf$name, myDf$day, myDf$score)))|duplicated(c(paste(myDf$name, myDf$day, myDf$score)), fromLast = TRUE)) but when I do my loop I'm not sure how I grab all rows which relate rather than row by row. Also appreciate this might not be the easiest way to do this.


Solution

  • I have created a new column (identifier) for the output so that the original column score is not changed.

    We can group the data by name and day , if the number of rows in a group is more than 1 we create a sequence a1, a2 and paste it with score value.

    library(dplyr)
    
    mydf |>
      mutate(identifier = if(n() > 1) paste0(score, " (a", row_number(), ")") 
             else as.character(score), .by = c(name, day))
    
    #   name       day score identifier
    #1   joe    monday    16    16 (a1)
    #2   joe    monday    16    16 (a2)
    #3 harry wednesday    88    88 (a1)
    #4 harry  thursday    55         55
    #5 james   tuesday    10         10
    #6  will    monday    10         10
    #7 harry wednesday    88    88 (a2)
    #8   joe   tuesday    16         16
    #9   joe    monday    16    16 (a3)
    

    and the same logic translated in base R :

    transform(mydf, identifier = ave(score, name, day, FUN = \(x) 
                    if(length(x) > 1) paste0(x, " (a", seq_along(x), ")") else x))