I am using mlxtend
to find association rules:
Here is the code:
df = apriori(dum_data, min_support=0.4, use_colnames=True)
rules = association_rules(df, metric="lift", min_threshold=1)
rules2=rules[ (rules['lift'] >= 1) & (rules['confidence'] >= 0.7) ]
Output:
antecedents consequents antecedentsupport consequentsupport support confidence lift leverage conviction
frozenset({'C'}) frozenset({'B'}) 0.63 0.705 0.45 0.726 1.030 0.013 1.077
frozenset({'A'}) frozenset({'B'}) 0.98 0.705 0.69 0.70 1.003 0.0007 1.00081
frozenset({'A', 'C'}) frozenset({'B'}) 0.63 0.705 0.45 0.72 1.030 0.013 1.0776
I have given a min support=0.4
. What is the difference between antecedentsupport
, consequentsupport
and support
?
What do mean by lift and leverage?
How to judge if its good or bad?
Confidence I can understand that is how many times C
and B
occured together for first rule in output. ? Is that correct
Let's take the third rule ({A,C} => {B})
as an example:
support = support of {A, B, C} | support means, that you count the number of transactions that contain all three of {A, B, C} and divide it by the total number of transactions.
antecedentsupport = support of what precedes the =>
, means support of {A,C}
consequentsupport = support of what comes after the =>
, means support of {B}
confidence = how likely is it, that after we observed {A,C} that the transaction additionally contains {B}. Think of it as the conditional probability p(B given {A,C})
.
Lift: The definition for lift can e.g. be found here: wikipedia. This means, that if lift < 1 then {A,C} and {B} occur together less often than expected. If lift is larger than one then {A,C} and {B} appear together more often than expected.
Leverage is roughly the same. It also compares the expected co-occurrence and the observed one. Further explanation e.g. here
What makes a good lift/leverage is subjective but I'd suggest a lift of > 1. If it comes to rules I would look more at confidence.