rcumulative-sumrunning-count

Running count based on field in R


I have a data set of this format

User       
1 
2
3
2
3
1  
1      

Now I want to add a column saying count which counts the occurrence of the user. I want output in the below format.

User    Count
1       1
2       1 
3       1
2       2
3       2
1       2
1       3

I have few solutions but all those solutions are somewhat slow.

Running count variable in R

My data.frame has 100,000 rows now and soon it may go up to 1 million. I need a solution which is also fast.


Solution

  • You can use getanID from my "splitstackshape" package:

    library(splitstackshape)
    getanID(mydf, "User")
    ##    User .id
    ## 1:    1   1
    ## 2:    2   1
    ## 3:    3   1
    ## 4:    2   2
    ## 5:    3   2
    ## 6:    1   2
    ## 7:    1   3
    

    This is essentially an approach with "data.table" that looks something like the following:

    as.data.table(mydf)[, count := seq(.N), by = "User"][]