rdataframesequencedplyrsequences

Numeric sequence with condition


I have a big data.frame that I want to generate a new column (called Seq) to, which has a sequential values that restarts every time there is a change in a different column. Here is an example of the data.frame (with omitted columns) and the new column called Seq. As you can see there is a sequentiel count, but everytime there is a new IDPath, the sequentiel count restarts. The sequentiel length can have different lengths, some are 1 long, while others are 300.

IDPath    LogTime               Seq
AADS      19-06-2015 01:57      1
AADS      19-06-2015 01:55      2
AADS      19-06-2015 01:54      3
AADS      19-06-2015 01:53      4
DHSD      19-06-2015 12:57      1
DHSD      19-06-2015 10:58      2
DHSD      19-06-2015 09:08      3
DHSD      19-06-2015 08:41      4

Solution

  • Obligatory Hadleyverse answer (base R answer also included after Hadleyvese answer):

    library(dplyr)
    
    dat <- read.table(text="IDPath    LogTime 
    AADS      '19-06-2015 01:57'      
    AADS      '19-06-2015 01:55'    
    AADS      '19-06-2015 01:54'      
    AADS      '19-06-2015 01:53'      
    DHSD      '19-06-2015 12:57'      
    DHSD      '19-06-2015 10:58'      
    DHSD      '19-06-2015 09:08'      
    DHSD      '19-06-2015 08:41'      ", header=TRUE, stringsAsFactors=FALSE, quote="'")
    
    mutate(group_by(dat, IDPath), Seq=1:n())
    

    OR (via David Arenburg)

    mutate(group_by(dat, IDPath), Seq=row_number())
    

    Or if you're into piping:

    dat %>%
      group_by(IDPath) %>%
      mutate(Seq=1:n())
    

    OR (via David Arenburg)

    dat %>%
      group_by(IDPath) %>%
      mutate(Seq=row_number())
    

    Obligatory base R answer:

    unsplit(lapply(split(dat, dat$IDPath), transform, Seq=1:length(IDPath)), dat$IDPath)
    

    OR more idiomatically (via David again)

    with(dat, ave(IDPath, IDPath, FUN = seq_along))
    

    If it really is a HUGE data frame then you may want to start with tbl_dt(dat) for the dplyr solutions, but CathG's or Jaap's versions will be faster if you're already using data.table.