rmultiple-conditionscumulative-frequency

cumulative frequency given certain conditions in R


I am new at R and I do not really know how to count cumulative number of occurrences of a row if it is the same ID but different date.Also, if the first date appears more than once, it should not count in the first attempt.Look at the rows 2 and 3, that is why the column I am trying to build is called "count" and equals to zero. "count" represents the times that the id has appeared but it follows the next two rules:

1.If it is the first appearence of the Id ever, it assigns zero. For example, for Id 1 despite there are two different registers for the very first date of this Id, "count" equals zero.

2.For the next Id 1 appereance (row 3 of Id1 ) it has a different date, that is why "count" equals 2. Because the 01/01/2018 Id1 appeared twice.

This is the DT and the result that I am looking for is in the "count" column:

enter image description here


Solution

  • We can use dplyr to group_by id and return 0 for the first date else return the count of occurrence of 1st date

    library(dplyr)
    
    df %>%
      group_by(id) %>%
      mutate(count = ifelse(date == date[1L], 0, sum(date == date[1L])))
    
    #     id   date       count
    #   <dbl> <fct>      <dbl>
    #1     1 01/01/2018     0
    #2     1 01/01/2018     0
    #3     1 02/01/2018     2
    #4     2 03/02/2018     0
    #5     2 04/04/2018     1
    

    Or using base R ave with similar logic

    as.numeric(with(df, ave(date, id, FUN = function(x) 
               ifelse(x == x[1L], 0, sum(x == x[1L])))))
    #[1] 0 0 2 0 1
    

    data

    df <- data.frame(id = c(1, 1, 1, 2, 2), date = c("01/01/2018", "01/01/2018", 
         "02/01/2018", "03/02/2018", "04/04/2018"), stringsAsFactors = FALSE)