rsequence-analysis

R -need help putting matrix into basket or transaction form


   Server  Epoch      A B C D E
    1 C301 1420100400 1 0 1 0 0
    2 C301 1420100700 0 0 0 0 0
    3 C301 1420152000 0 1 0 0 0
    4 C301 1420238100 1 1 1 0 0
    5 C301 1420324500 1 1 1 1 1

I need help getting the matrix above into basket or transaction form (to use with cSpade algorithm in package arulesSequences) such that every "1" in the matrix is a transaction item. ie, the output would look something like this:

Server    Epoch       #items    Items
C301      1420100400  2         A C
C301      1420152000  1         B
C301      1420238100  3         A B C
C301      1420324500  5         A B C D E

I've written a long function but its not very efficient and very time consuming. It needs to be scalable across huge data sets. thanks for help in advance


Solution

  • You can try a combination of melt from reshape2 and aggregate. After melting the dataset, isolate the values equaling 1 to aggregate by Server and Epoch. To sum the variables in the column we use length, and toString for the list of Items:

    library(reshape2)
    m <- melt(df1, c("Server", "Epoch"))
    aggregate(variable~Server+Epoch, m[m$value==1,], FUN=function(x) cbind(length(x), toString(x)))
    # Server      Epoch variable.1    variable.2
    # 1   C301 1420100400          2          A, C
    # 2   C301 1420152000          1             B
    # 3   C301 1420238100          3       A, B, C
    # 4   C301 1420324500          5 A, B, C, D, E