rmarket-basket-analysis

Melt basket dataframe to single dataframe without loops


I have a dataframe in basket format like this:

V1 <- c('milk', 'beer', 'wrench', 'milk' )
V2 <- c('eggs', 'elbow grease', '', 'beer')
V3 <- c('water', '', '', '')

df <- data.frame(V1, V2, V3)

output:

      V1      V2           V3
1   milk      eggs         water
2   beer      elbow grease      
3   wrench                   
4   milk      beer  

What I would like to produce is a dataframe in single format like this:

  transaction   product
1           1   milk
2           1   eggs
3           1   water
4           2   beer
5           2   elbow grease
6           3   wrench
7           4   milk
8           4   beer

For now, I want the data in a dataframe so I can filter before I switch to the transactions format that the apriori R package uses.

What is the fastest way to convert this dataframe from basket to single format?

Right now I'm using a loop which is very slow.

dfSingle <- data.frame(product = character(),
                    transaction = integer())
for (row in 1:nrow(df))  {
  # Create a list of products
  productList <- unname(unlist(df[row, ]))

  # Remove blank spaces
  productList <- productList[!productList %in% ""]

  # Convert to a dataframe
  dfTemp <- as.data.frame(productList)
  colnames(dfTemp) <- "product"
  dfTemp$transaction <- row

  # Bind to larger dataframe with previous rows
  dfSingle <- rbind(dfSingle, dfTemp)

}

I've thought about using apply to apply this function to each row, but I'm confused about how to bind the multiple resulting rows to the results of the previous rows.


Solution

  • or a data.table approach (one-liner)

    First melt get the transaction from the rownames: setDT(df)[, transaction := .I ] Then melt, using transaction as id-column: melt( ... , id = "transaction" ) And finally drop the empty values and return the first and third column: ...[!value == "", c(1,3) ]

    melt( setDT(df)[, transaction := .I ], id = "transaction" )[!value == "", c(1,3) ]
    
    #    transaction        value
    # 1:           1         milk
    # 2:           2         beer
    # 3:           3       wrench
    # 4:           4         milk
    # 5:           1         eggs
    # 6:           2 elbow grease
    # 7:           4         beer
    # 8:           1        water