mongodbdata-miningmarket-basket-analysisbigdata

How to implement Associative Rules Analysis or Market Basket Analysis from scratch?


I tried to went through numerous articles trying to understand what should be my first step to incorporate associative analysis (may be Market Basket analysis) into my system. They all go deep into implementation of algorithm but no one talked about how to store data in the first place. I will really appreciate if someone can give me some start pointers or article links that I can begin with.

The first thing I want to implement is to track user clicks and provide suggestions based on tracked data. E.g. User clicked on link A and subsequently on link B and link C. I can track this activity with some metadata associated (user, user organization, user role etc.)

I do not want it to be limited only to links. In future, I want to add number of similar usecases into the system and want to make it smart. E.g. If user set specific values for fields A and B, most likely he/she will set value <bla> for field C.

My system may generate several thousand such data points in a day (E.g. user clicks, field selection etc.).

Below are my questions:

  1. How should I store my data? Go SQL or No SQL (I briefly looked into Mongo DB and it looked promising)

  2. What tool should I use to perform the associative analysis? Are there any open source tools I can use?


Solution

    1. It depend. Does your data suitable for NoSql databases? To answer this question it's better to read CAP Theorem and it's case studies: https://en.wikipedia.org/wiki/CAP_theorem or http://robertgreiner.com/2014/06/cap-theorem-explained/ . Some time you want Consistency(depending to your data) and Availability => so that it's better to use Relational Databases like Mysql(Try to read case studies and analyse your data to pick the best tools)

    2. There is large number of open source libraries, but in my opinion it's better to first read some concepts and algorithms. Try searching for Apriori,ECLAT, FP-GROWTH Algorithms and get concepts of them. then you can pick a tool or write the code your self. Some usefull tools(depending to your programming language):

    Python: https://github.com/asaini/Apriori, https://github.com/enaeseth/python-fp-growth, https://github.com/enaeseth/python-fp-growth/blob/master/fp_growth.py

    PHP: https://github.com/sigidhanafi/fp-growth-php

    JAVA: https://github.com/goodinges/FP-Growth-Java, http://www.philippe-fournier-viger.com/spmf/

    Also you can use Spark: https://spark.apache.org/docs/1.1.1/mllib-guide.html