rarules

error with arules::read.transactions(): "'cols' does not match entries in header of file" even though they do match


I have a text file like the following:

transactionID item
T100 l1,l2,l3
T200 l2,l4
T300 l2,l3
T400 l1,l2,l3
T500 l1,l3
T600 l2,l3
T700 l1,l3
T800 l1,l2,l3,l5
T900 l1,l2,l3

And I would like to read it as a transaction file for arules. I used the following:

transacciones <- read.transactions(file = "/home/norhther/Escritorio/trans.txt",
                                   format = "single",
                                   sep = ",",
                                   header = TRUE,
                                   cols = c("transactionID", "item"),
                                   rm.duplicates = TRUE)

However, I get the following error:

Error in read.transactions(file = "/home/norhther/Escritorio/trans.txt", : 
'cols' does not match entries in header of file.

Solution

  • Edit

    You should change your format to basket and use a separator of sep = " " with cols = 1 like this:

    text = 'transactionID item
    T100 l1,l2,l3
    T200 l2,l4
    T300 l2,l3
    T400 l1,l2,l3
    T500 l1,l3
    T600 l2,l3
    T700 l1,l3
    T800 l1,l2,l3,l5
    T900 l1,l2,l3'
    
    write(text, file = "trans.txt")
    
    library(arules)
    transacciones <- read.transactions(file = "~/Downloads/trans.txt",
                                       format = "basket",
                                       sep = " ",
                                       skip = 1,
                                       cols = c(1),
                                       rm.duplicates = TRUE)
    
    inspect(transacciones)
    #>     items         transactionID
    #> [1] {l1,l2,l3}    T100         
    #> [2] {l2,l4}       T200         
    #> [3] {l2,l3}       T300         
    #> [4] {l1,l2,l3}    T400         
    #> [5] {l1,l3}       T500         
    #> [6] {l2,l3}       T600         
    #> [7] {l1,l3}       T700         
    #> [8] {l1,l2,l3,l5} T800         
    #> [9] {l1,l2,l3}    T900
    

    Created on 2022-11-20 with reprex v2.0.2


    According to the documentation of the function read.transactions, you can use the argument cols:

    For the single format, cols is a numeric or character vector of length two giving the numbers or names of the columns (fields) with the transaction and item ids, respectively. If character, the first line of file is assumed to be a header with column names. For the basket format, cols can be a numeric scalar giving the number of the column (field) with the transaction ids. If cols = NULL, the data do not contain transaction ids.

    So you can specify your columns by numeric vector like c(1,2). Here is a reproducible example:

    text = 'transactionID item
    T100 l1,l2,l3
    T200 l2,l4
    T300 l2,l3
    T400 l1,l2,l3
    T500 l1,l3
    T600 l2,l3
    T700 l1,l3
    T800 l1,l2,l3,l5
    T900 l1,l2,l3'
    cat(text)
    #> transactionID item
    #> T100 l1,l2,l3
    #> T200 l2,l4
    #> T300 l2,l3
    #> T400 l1,l2,l3
    #> T500 l1,l3
    #> T600 l2,l3
    #> T700 l1,l3
    #> T800 l1,l2,l3,l5
    #> T900 l1,l2,l3
    write(text, file = "trans.txt")
    
    library(arules)
    transacciones <- read.transactions(file = "~/Downloads/trans.txt", # Change to your own directory
                                       format = "single",
                                       sep = ",",
                                       header = TRUE,
                                       cols = c(1, 2),
                                       rm.duplicates = TRUE)
    
    inspect(transacciones)
    #>     items transactionID
    #> [1] {l2}  T100 l1      
    #> [2] {l4}  T200 l2      
    #> [3] {l3}  T300 l2      
    #> [4] {l2}  T400 l1      
    #> [5] {l3}  T500 l1      
    #> [6] {l3}  T600 l2      
    #> [7] {l3}  T700 l1      
    #> [8] {l2}  T800 l1      
    #> [9] {l2}  T900 l1
    

    Created on 2022-11-19 with reprex v2.0.2