raprioriarules

How can we find support and confident in apriori for rules?


I am doing item association in transaction data. I am using arules package in R, to build the rules. I am sharing my sample data with this link https://1drv.ms/u/s!Ak1rt2E1f2gFgV9t7hMVAn0P4gd0

library(arules)
library(arulesViz)
df = read.csv("trans.csv")
trans = as(split(df[,"Item"], df[,"Billno"]), "transactions")
inspect(trans[1:20])
summary(trans)
rules1 = apriori(trans,parameter = list(support = 0.6, confidence = 0.6, 
target = "rules"))
summary(rules1) ##Output is "Set of 0 rules"

I am getting output as,

Summary(rules1)

set of 0 rules

I referred https://stats.stackexchange.com/questions/56034/association-analysis-returns-0-useful-rules this link before posting this. And I also tried random numbers for support and confidence, nothing works.


Solution

  • Issues with finding the right minimum support and minimum confidence value and ending up with 0 frequent itemsets or 0 association rules are quite common. Read this if you need a refresher what support and confidence exactly mean.

    Let's look at your transaction data first:

    summary(trans)
    transactions as itemMatrix in sparse format with
     2531 rows (elements/itemsets/transactions) and
     6632 columns (items) and a density of 0.0005951533 
    
    most frequent items:
    AR845311 AR800369 AR828249 AR839869 AR831167  (Other) 
          84       35       31       29       24     9787 
    
    element (itemset/transaction) length distribution:
    sizes
       1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
     767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
     23  24  25  27  28  32  34  36  48 
      3   4   2   3   1   1   1   1   1 
    
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      1.000   1.000   2.000   3.947   5.000  48.000 
    

    The first issue to deal with is minimum support. The summary says that your most frequent item (AR845311) occurs 84 times in the data set. Your items in general have very low support

    summary(itemFrequency(trans))
    
          Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
          0.0003951 0.0003951 0.0003951 0.0005952 0.0003951 0.0331900 
    

    You use a min. support of 0.6, but the most frequent single item has only a support of 0.033! You need to reduce your support. If you want to find itemsets/rules that occur at least 10 times in your data then you could set minimum support to:

     10/length(trans)
    
     [1] 0.003951008
    

    The second issue is that your data is very sparse (the summary shows a density of about 0.0006). This means that your transactions are rather short (i.e., contain only few items).

    table(size(trans))
    
      1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21 
    767 509 306 238 160 112 100  52  69  50  31  27  18  12  13  15   9  10   7   5   4 
     23  24  25  27  28  32  34  36  48 
      3   4   2   3   1   1   1   1   1 
    

    Short transactions means that confidence of rules will be probably low. For your data it turns out that it is very low, so I use 0 first.

    rules <- apriori(trans, 
    +   parameter = list(support = 0.004, confidence = 0, target = "rules"))
    Apriori
    
    Parameter specification:
     confidence minval smax arem  aval originalSupport maxtime support minlen maxlen
              0    0.1    1 none FALSE            TRUE       5   0.004      1     10
     target   ext
      rules FALSE
    
    Algorithmic control:
     filter tree heap memopt load sort verbose
        0.1 TRUE TRUE  FALSE TRUE    2    TRUE
    
    Absolute minimum support count: 10 
    
    set item appearances ...[0 item(s)] done [0.00s].
    set transactions ...[6632 item(s), 2531 transaction(s)] done [0.00s].
    sorting and recoding items ... [40 item(s)] done [0.00s].
    creating transaction tree ... done [0.00s].
    checking subsets of size 1 2 done [0.00s].
    writing ... [46 rule(s)] done [0.00s].
    creating S4 object  ... done [0.00s].
    > summary(rules)
    set of 46 rules
    
    rule length distribution (lhs + rhs):sizes
     1  2 
    40  6 
    
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
       1.00    1.00    1.00    1.13    1.00    2.00 
    
    summary of quality measures:
        support           confidence            lift            count      
     Min.   :0.004346   Min.   :0.004346   Min.   : 1.000   Min.   :11.00  
     1st Qu.:0.004741   1st Qu.:0.004840   1st Qu.: 1.000   1st Qu.:12.00  
     Median :0.005531   Median :0.005729   Median : 1.000   Median :14.00  
     Mean   :0.006803   Mean   :0.057301   Mean   : 3.316   Mean   :17.22  
     3rd Qu.:0.007112   3rd Qu.:0.008890   3rd Qu.: 1.000   3rd Qu.:18.00  
     Max.   :0.033188   Max.   :0.705882   Max.   :21.269   Max.   :84.00  
    
    mining info:
      data ntransactions support confidence
     trans          2531   0.004          0
    

    The results show that there is at least one rules with a confidence of 0.7. You can run APRIORI again with a higher confidence. Here are the top confidence rules:

    inspect(head(rules, by = "confidence"))
        lhs           rhs        support     confidence lift     count
    [1] {AR835501} => {AR845311} 0.004741209 0.7058824  21.26891 12   
    [2] {AR743988} => {AR845311} 0.004346108 0.6470588  19.49650 11   
    [3] {AR800369} => {AR845311} 0.007111814 0.5142857  15.49592 18   
    [4] {AR845311} => {AR800369} 0.007111814 0.2142857  15.49592 18   
    [5] {AR845311} => {AR835501} 0.004741209 0.1428571  21.26891 12   
    [6] {AR845311} => {AR743988} 0.004346108 0.1309524  19.49650 11 
    

    Complete examples on how to use association rule mining can be found here.

    Hope this helps!