I have a text file like the following:
transactionID item
T100 l1,l2,l3
T200 l2,l4
T300 l2,l3
T400 l1,l2,l3
T500 l1,l3
T600 l2,l3
T700 l1,l3
T800 l1,l2,l3,l5
T900 l1,l2,l3
And I would like to read it as a transaction file for arules. I used the following:
transacciones <- read.transactions(file = "/home/norhther/Escritorio/trans.txt",
format = "single",
sep = ",",
header = TRUE,
cols = c("transactionID", "item"),
rm.duplicates = TRUE)
However, I get the following error:
Error in read.transactions(file = "/home/norhther/Escritorio/trans.txt", :
'cols' does not match entries in header of file.
Edit
You should change your format to basket
and use a separator of sep = " "
with cols = 1
like this:
text = 'transactionID item
T100 l1,l2,l3
T200 l2,l4
T300 l2,l3
T400 l1,l2,l3
T500 l1,l3
T600 l2,l3
T700 l1,l3
T800 l1,l2,l3,l5
T900 l1,l2,l3'
write(text, file = "trans.txt")
library(arules)
transacciones <- read.transactions(file = "~/Downloads/trans.txt",
format = "basket",
sep = " ",
skip = 1,
cols = c(1),
rm.duplicates = TRUE)
inspect(transacciones)
#> items transactionID
#> [1] {l1,l2,l3} T100
#> [2] {l2,l4} T200
#> [3] {l2,l3} T300
#> [4] {l1,l2,l3} T400
#> [5] {l1,l3} T500
#> [6] {l2,l3} T600
#> [7] {l1,l3} T700
#> [8] {l1,l2,l3,l5} T800
#> [9] {l1,l2,l3} T900
Created on 2022-11-20 with reprex v2.0.2
According to the documentation of the function read.transactions
, you can use the argument cols
:
For the single format, cols is a numeric or character vector of length two giving the numbers or names of the columns (fields) with the transaction and item ids, respectively. If character, the first line of file is assumed to be a header with column names. For the basket format, cols can be a numeric scalar giving the number of the column (field) with the transaction ids. If cols = NULL, the data do not contain transaction ids.
So you can specify your columns by numeric vector like c(1,2)
. Here is a reproducible example:
text = 'transactionID item
T100 l1,l2,l3
T200 l2,l4
T300 l2,l3
T400 l1,l2,l3
T500 l1,l3
T600 l2,l3
T700 l1,l3
T800 l1,l2,l3,l5
T900 l1,l2,l3'
cat(text)
#> transactionID item
#> T100 l1,l2,l3
#> T200 l2,l4
#> T300 l2,l3
#> T400 l1,l2,l3
#> T500 l1,l3
#> T600 l2,l3
#> T700 l1,l3
#> T800 l1,l2,l3,l5
#> T900 l1,l2,l3
write(text, file = "trans.txt")
library(arules)
transacciones <- read.transactions(file = "~/Downloads/trans.txt", # Change to your own directory
format = "single",
sep = ",",
header = TRUE,
cols = c(1, 2),
rm.duplicates = TRUE)
inspect(transacciones)
#> items transactionID
#> [1] {l2} T100 l1
#> [2] {l4} T200 l2
#> [3] {l3} T300 l2
#> [4] {l2} T400 l1
#> [5] {l3} T500 l1
#> [6] {l3} T600 l2
#> [7] {l3} T700 l1
#> [8] {l2} T800 l1
#> [9] {l2} T900 l1
Created on 2022-11-19 with reprex v2.0.2