Server Epoch A B C D E
1 C301 1420100400 1 0 1 0 0
2 C301 1420100700 0 0 0 0 0
3 C301 1420152000 0 1 0 0 0
4 C301 1420238100 1 1 1 0 0
5 C301 1420324500 1 1 1 1 1
I need help getting the matrix above into basket or transaction form (to use with cSpade algorithm in package arulesSequences) such that every "1" in the matrix is a transaction item. ie, the output would look something like this:
Server Epoch #items Items
C301 1420100400 2 A C
C301 1420152000 1 B
C301 1420238100 3 A B C
C301 1420324500 5 A B C D E
I've written a long function but its not very efficient and very time consuming. It needs to be scalable across huge data sets. thanks for help in advance
You can try a combination of melt
from reshape2
and aggregate
. After melting the dataset, isolate the values equaling 1
to aggregate by Server
and Epoch
. To sum the variables in the column we use length
, and toString
for the list of Items:
library(reshape2)
m <- melt(df1, c("Server", "Epoch"))
aggregate(variable~Server+Epoch, m[m$value==1,], FUN=function(x) cbind(length(x), toString(x)))
# Server Epoch variable.1 variable.2
# 1 C301 1420100400 2 A, C
# 2 C301 1420152000 1 B
# 3 C301 1420238100 3 A, B, C
# 4 C301 1420324500 5 A, B, C, D, E