rdplyrcumsumsplitstackshape

cumsum by participant and reset on 0 R


I have a data frame that looks like this below. I need to sum the number of correct trials by participant, and reset the counter when it gets to a 0.

Participant TrialNumber Correct 
      118           1       1     
      118           2       1     
      118           3       1     
      118           4       1     
      118           5       1     
      118           6       1     
      118           7       1     
      118           8       0     
      118           9       1     
      118          10       1     
      120           1       1     
      120           2       1     
      120           3       1     
      120           4       1     
      120           5       0     
      120           6       1     
      120           7       0     
      120           8       1     
      120           9       1     
      120          10       1     

I've tried using splitstackshape:

df$Count <- getanID(cbind(df$Participant, cumsum(df$Correct)))[,.id]

But it cumulatively sums the correct trials when it gets to a 0 and not by participant:

Participant TrialNumber Correct Count
      118           1       1     1
      118           2       1     1
      118           3       1     1
      118           4       1     1
      118           5       1     1
      118           6       1     1
      118           7       1     1
      118           8       0     2
      118           9       1     1
      118          10       1     1
      120           1       1     1
      120           2       1     1
      120           3       1     1
      120           4       1     1
      120           5       0     2
      120           6       1     1
      120           7       0     2
      120           8       1     1
      120           9       1     1
      120          10       1     1

I then tried using dplyr:

df %>% 
  group_by(Participant) %>%
  mutate(Count=cumsum(Correct)) %>%
  ungroup %>% 
  as.data.frame(df)
Participant TrialNumber Correct Count
      118           1       1     1
      118           2       1     2
      118           3       1     3
      118           4       1     4
      118           5       1     5
      118           6       1     6
      118           7       1     7
      118           8       0     7
      118           9       1     8
      118          10       1     9
      120           1       1     1
      120           2       1     2
      120           3       1     3
      120           4       1     4
      120           5       0     4
      120           6       1     5
      120           7       0     5
      120           8       1     6
      120           9       1     7
      120          10       1     8

Which gets me closer, but still doesn't reset the counter when it gets to 0. If anyone has any suggestions to do this it would be greatly appreciated, thank you


Solution

  • Does this work?

    library(dplyr)
    library(data.table)
    df %>% 
      mutate(grp = rleid(Correct)) %>%
      group_by(Participant, grp) %>%
      mutate(Count = cumsum(Correct)) %>%
      select(- grp)
    # A tibble: 10 x 4
    # Groups:   Participant, grp [6]
         grp Participant Correct Count
       <int> <chr>         <dbl> <dbl>
     1     1 A                 1     1
     2     1 A                 1     2
     3     1 A                 1     3
     4     2 A                 0     0
     5     3 A                 1     1
     6     3 B                 1     1
     7     3 B                 1     2
     8     4 B                 0     0
     9     5 B                 1     1
    10     5 B                 1     2
    

    Toy data:

    df <- data.frame(
      Participant = c(rep("A", 5), rep("B", 5)),
      Correct = c(1,1,1,0,1,1,1,0,1,1)
    )