pythonmachine-learningdata-mining

Using frequent itemset mining to build association rules?


I am new to this area as well as the terminology so please feel free to suggest if I go wrong somewhere. I have two datasets like this:

Dataset 1:

A B C 0 E
A 0 C 0 0
A 0 C D E
A 0 C 0 E

The way I interpret this is at some point in time, (A,B,C,E) occurred together and so did (A,C), (A,C,D,E) etc.

Dataset 2:

5A 1B 5C  0 2E
4A  0 5C  0  0
2A  0 1C 4D 4E
3A  0 4C  0 3E

The way I interpret this is at some point in time, 5 occurrences of A, 1 occurrence of B, 5 occurrences of C and 2 occurrences of E happened and so on.

I am trying to find what items occur together and if possible, also find out the cause and effect for this. For this, I am not understanding how to go about using both the datasets (or if one is enough). It would be good to have a good tutorial on this but my primary question is which dataset to utilize and how to proceed in (i) building a frequent itemset and (ii) building association rules between them.

Can someone point me to a practical tutorials/examples (preferably in Python) or at least explain in brief words on how to approach this problem?


Solution

  • Some theoretical facts about association rules:

    To find association rules, you can use apriori algorithm. There already exists many python implementation, although most of them are not efficient for practical usage:

    or use Orange data mining library, which has a good library for association rules.

    Usage example:

    '''
    save first example as item.basket with format
    A, B, C, E
    A, C
    A, C, D, E
    A, C, E
    open ipython same directory as saved file or use os module
    >>> import os
    >>> os.chdir("c:/orange")
    '''
    import orange
    
    items = orange.ExampleTable("item")
    #play with support argument to filter out rules
    rules = orange.AssociationRulesSparseInducer(items, support = 0.1) 
    for r in rules:
        print "%5.3f %5.3f %s" % (r.support, r.confidence, r)
    

    To learn more about association rules/frequent item mining, then my selection of books are:

    There is no short way.