rdata-manipulationclickstream

Getting clickstream sequence from data frame


I have a clickstream .csv data file in this format :

     Step_Name   Session_ID
        p-1          1
        p-2          1
        defer        1
        p-1          2
        p-3          2
        buy          2
        p-4          3 
        defer        3

I am trying to use clickstream package in R for analysis. This package requires us to convert the above data into this format :

   1    p-1    p-2    defer
   2    p-1    p-3    buy 
   3    p-4    defer   

where the first column is Session ID and number of columns may vary for each session depending on the number of steps the website user has taken. How do I get the initial data into the above format (in either excel or R)?

Also, if there are other ways to analyse clickstream data in R without such data manipulation, please let me know.


Solution

  • Using data.table, load your data:

    > library(data.table)
    > df = data.table('Step_Name' = c('p-1', 'p-2', 'defer','p-1','p-3','buy','p-4','defer'), 'Session_ID' = c(rep(1,3),rep(2,3),rep(3,2)))
    > df
       Step_Name Session_ID
    1:       p-1          1
    2:       p-2          1
    3:     defer          1
    4:       p-1          2
    5:       p-3          2
    6:       buy          2
    7:       p-4          3
    8:     defer          3
    

    Add a row number ID, restarting for each new Session_ID.

    > df[, id := seq_len(.N), by = Session_ID]
    

    Cast your data table:

    > df_final = dcast(df, Session_ID ~ id, value.var = 'Step_Name')
    

    And your desired result:

    > df_final
       Session_ID   1     2     3
    1:          1 p-1   p-2 defer
    2:          2 p-1   p-3   buy
    3:          3 p-4 defer  <NA>
    

    NB: This relies on your data being in order within Session_ID.