rdatesessionunique-values

Collapse and count the number of unique value


I am currently working on an application where I have a dataframe that looks like this:

Database
UserId         Hour         Date
01                18           01.01.2016
01                18           01.01.2016
01                14           02.01.2016
01                14           02.01.2016
02                21           02.01.2016
02                08           05.01.2016
02                08           05.01.2016
03                23           05.01.2016

Each line represents a session.

I need to determine whether the time of the first session of a user has an impact on the number of sessions this user is going to have.

I have tried the command summaryBy:

library(doBy)
first_hour <- summaryBy(UserId + Hour + Date ~ UserId, 
    FUN=c(head, length, unique), database)

But it doesn't give me the correct result.

My goal here is to determine the Hour of the first session a user takes, determine how many sessions and how many different session dates a user has.


Solution

  • We can use data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'UserId', we order the 'Date', get the first 'Hour', total number of sessions (.N) and the number of unique Date elements (uniqueN(Date)).

    library(data.table)
    setDT(df1)[order(UserId, as.Date(Date, "%m.%d.%Y")),.(Hour = Hour[1L],
          Sessions = .N, DifferSessionDate = uniqueN(Date)) , by = UserId]
    #    UserId Hour Sessions DifferSessionDate
    #1:      1   18        4                 2
    #2:      2   21        3                 2
    #3:      3   23        1                 1