rarules

How to convert a data frame to arules' transaction object


I'm trying to do association rules on a dataset using the library arules in R. The dataset has a transaction column and 5 items columns - I'm trying to turn the data into a list to then use arules but because there is more then one items column I'm not sure how to go about this.

my data set looks like the below:

Transaction     Item1        Item2         Item3    

12/09/2001     lipstick      Bronzer        Mascara
2/09/2001     Eyeshadow     lipstick
13/09/2002     Powder        Remover
14/09/2003     Nail varnish  Lip gloss      Eyeliner 

The code I would usually use for one transaction column and one items columns is below.

library(arules)
Transactions <- split(data$item, data$transaction)

basketanalysis <- as(Transactions, "transactions")

Any help would be hugely appreciated.


Solution

  • Here is what I tried. I think you need to manipulate your data and create lists. First, I created transaction ID just in case. Then, I transformed the data to a long-format data frame. By this time, all products stay in one column. I removed all rows that have NA. Then, I converted products to factor. For each group (transaction id), I created list containing all products. x has a column called whatever. This is the list you want to use to create a transaction object.

    library(tidyverse)
    library(arules)
    
    mutate(mydata, transaction_id = 1:n()) %>% 
    pivot_longer(cols = contains("Item"), names_to = "item", values_to = "product") %>% 
    filter(complete.cases(product)) %>% 
    mutate(product = factor(product)) %>% 
    group_by(transaction_id) %>% 
    summarize(whatever = list(product)) -> x
    
    # Assign transaction ID as name to whatever
    names(x$whatever) <- x$transaction_id
    
    $`1`
    [1] lipstick Bronzer  Mascara 
    Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover
    
    $`2`
    [1] Eyeshadow lipstick 
    Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover
    
    $`3`
    [1] Powder  Remover
    Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover
    
    $`4`
    [1] Nail varnish Lip gloss    Eyeliner    
    Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover
    

    Finally, I created a transaction-class object.

    mybasket <- as(x$whatever, "transactions")
    
    > summary(mybasket)
    transactions as itemMatrix in sparse format with
     4 rows (elements/itemsets/transactions) and
     9 columns (items) and a density of 0.2777778 
    
    most frequent items:
     lipstick   Bronzer  Eyeliner Eyeshadow Lip gloss   (Other) 
            2         1         1         1         1         4 
    
    element (itemset/transaction) length distribution:
    sizes
    2 3 
    2 2 
    
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       2.0     2.0     2.5     2.5     3.0     3.0 
    
    includes extended item information - examples:
         labels
    1   Bronzer
    2  Eyeliner
    3 Eyeshadow
    
    includes extended transaction information - examples:
      transactionID
    1             1
    2             2
    3             3
    

    DATA

    mydata <- structure(list(Transaction = c("12/09/2001", "2/09/2001", "13/09/2002", 
    "14/09/2003"), Item1 = c("lipstick", "Eyeshadow", "Powder", "Nail varnish"
    ), Item2 = c("Bronzer", "lipstick", "Remover", "Lip gloss"), 
    Item3 = c("Mascara", NA, NA, "Eyeliner")), row.names = c(NA, 
    -4L), class = c("tbl_df", "tbl", "data.frame"))