I am doing item association in transaction data. I am using arules package in R, to build the rules. I am sharing my sample data with this link https://1drv.ms/u/s!Ak1rt2E1f2gFgV9t7hMVAn0P4gd0
library(arules)
library(arulesViz)
df = read.csv("trans.csv")
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions")
inspect(trans[1:20])
summary(trans)
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6,
target = "rules"))
summary(rules1) ##Output is "Set of 0 rules"
I am getting output as,
Summary(rules1)
set of 0 rules
I referred https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules this link before posting this. And I also tried random numbers for support and confidence, nothing works.
Issues with finding the right minimum support and minimum confidence value and ending up with 0 frequent itemsets or 0 association rules are quite common. Read this if you need a refresher what support and confidence exactly mean.
Let's look at your transaction data first:
summary(trans)
transactions as itemMatrix in sparse format with
2531 rows (elements/itemsets/transactions) and
6632 columns (items) and a density of 0.0005951533
most frequent items:
AR845311 AR800369 AR828249 AR839869 AR831167 (Other)
84 35 31 29 24 9787
element (itemset/transaction) length distribution:
sizes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4
23 24 25 27 28 32 34 36 48
3 4 2 3 1 1 1 1 1
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 2.000 3.947 5.000 48.000
The first issue to deal with is minimum support. The summary says that your most frequent item (AR845311
) occurs 84 times in the data set. Your items in general have very low support
summary(itemFrequency(trans))
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900
You use a min. support of 0.6, but the most frequent single item has only a support of 0.033! You need to reduce your support. If you want to find itemsets/rules that occur at least 10 times in your data then you could set minimum support to:
10/length(trans)
[1] 0.003951008
The second issue is that your data is very sparse (the summary shows a density of about 0.0006). This means that your transactions are rather short (i.e., contain only few items).
table(size(trans))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
767 509 306 238 160 112 100 52 69 50 31 27 18 12 13 15 9 10 7 5 4
23 24 25 27 28 32 34 36 48
3 4 2 3 1 1 1 1 1
Short transactions means that confidence of rules will be probably low. For your data it turns out that it is very low, so I use 0 first.
rules <- apriori(trans,
+ parameter = list(support = 0.004, confidence = 0, target = "rules"))
Apriori
Parameter specification:
confidence minval smax arem aval originalSupport maxtime support minlen maxlen
0 0.1 1 none FALSE TRUE 5 0.004 1 10
target ext
rules FALSE
Algorithmic control:
filter tree heap memopt load sort verbose
0.1 TRUE TRUE FALSE TRUE 2 TRUE
Absolute minimum support count: 10
set item appearances ...[0 item(s)] done [0.00s].
set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s].
sorting and recoding items ... [40 item(s)] done [0.00s].
creating transaction tree ... done [0.00s].
checking subsets of size 1 2 done [0.00s].
writing ... [46 rule(s)] done [0.00s].
creating S4 object ... done [0.00s].
> summary(rules)
set of 46 rules
rule length distribution (lhs + rhs):sizes
1 2
40 6
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 1.00 1.00 1.13 1.00 2.00
summary of quality measures:
support confidence lift count
Min. :0.004346 Min. :0.004346 Min. : 1.000 Min. :11.00
1st Qu.:0.004741 1st Qu.:0.004840 1st Qu.: 1.000 1st Qu.:12.00
Median :0.005531 Median :0.005729 Median : 1.000 Median :14.00
Mean :0.006803 Mean :0.057301 Mean : 3.316 Mean :17.22
3rd Qu.:0.007112 3rd Qu.:0.008890 3rd Qu.: 1.000 3rd Qu.:18.00
Max. :0.033188 Max. :0.705882 Max. :21.269 Max. :84.00
mining info:
data ntransactions support confidence
trans 2531 0.004 0
The results show that there is at least one rules with a confidence of 0.7. You can run APRIORI again with a higher confidence. Here are the top confidence rules:
inspect(head(rules, by = "confidence"))
lhs rhs support confidence lift count
[1] {AR835501} => {AR845311} 0.004741209 0.7058824 21.26891 12
[2] {AR743988} => {AR845311} 0.004346108 0.6470588 19.49650 11
[3] {AR800369} => {AR845311} 0.007111814 0.5142857 15.49592 18
[4] {AR845311} => {AR800369} 0.007111814 0.2142857 15.49592 18
[5] {AR845311} => {AR835501} 0.004741209 0.1428571 21.26891 12
[6] {AR845311} => {AR743988} 0.004346108 0.1309524 19.49650 11
Complete examples on how to use association rule mining can be found here.
Hope this helps!