I have a dataframe in basket format like this:
V1 <- c('milk', 'beer', 'wrench', 'milk' )
V2 <- c('eggs', 'elbow grease', '', 'beer')
V3 <- c('water', '', '', '')
df <- data.frame(V1, V2, V3)
output:
V1 V2 V3
1 milk eggs water
2 beer elbow grease
3 wrench
4 milk beer
What I would like to produce is a dataframe in single format like this:
transaction product
1 1 milk
2 1 eggs
3 1 water
4 2 beer
5 2 elbow grease
6 3 wrench
7 4 milk
8 4 beer
For now, I want the data in a dataframe so I can filter before I switch to the transactions format that the apriori R package uses.
What is the fastest way to convert this dataframe from basket to single format?
Right now I'm using a loop which is very slow.
dfSingle <- data.frame(product = character(),
transaction = integer())
for (row in 1:nrow(df)) {
# Create a list of products
productList <- unname(unlist(df[row, ]))
# Remove blank spaces
productList <- productList[!productList %in% ""]
# Convert to a dataframe
dfTemp <- as.data.frame(productList)
colnames(dfTemp) <- "product"
dfTemp$transaction <- row
# Bind to larger dataframe with previous rows
dfSingle <- rbind(dfSingle, dfTemp)
}
I've thought about using apply
to apply this function to each row, but I'm confused about how to bind the multiple resulting rows to the results of the previous rows.
or a data.table approach (one-liner)
First melt get the transaction from the rownames: setDT(df)[, transaction := .I ]
Then melt, using transaction as id-column: melt( ... , id = "transaction" )
And finally drop the empty values and return the first and third column: ...[!value == "", c(1,3) ]
melt( setDT(df)[, transaction := .I ], id = "transaction" )[!value == "", c(1,3) ]
# transaction value
# 1: 1 milk
# 2: 2 beer
# 3: 3 wrench
# 4: 4 milk
# 5: 1 eggs
# 6: 2 elbow grease
# 7: 4 beer
# 8: 1 water