javahadoopapache-sparkmahoutmahout-recommender

How to write recommendation on Mahout Spark


Mahout 0.13.0 / java 8

I am completely new in Mahout and trying to understand how to implement recommendation engine using Mahout. So far i know ..

Mahout provides 3 types of filtering -

  1. Collaborative filtering (non Hadoop based)
  2. Classification (Hadoop based)
  3. Clustering (Hadoop based)
  4. Content Based filtering

To start implementing my first recommendation, i started with Collaborative filtering which is easy to implement without Hadoop.

Collaborative Filtering -

  1. User Based Recommendation
  2. Item Based Recommendation
  3. Slop one
  4. and some more ...

Mahout Interface :

1.DataModel 2. UserSimilarity 3. ItemSimilarity 4. UserNeighborhood 5. Recommender

I understand its component and have written user and item based recommendation using multiple combination of Similarities and neighborhood.

Question :

  1. Since Collaborative Filtering based on Map-Reduce, Does Mahout deprecate Map-Reduce completely in 0.13.0 release ? Does all collaborative filtering algorithm got deprecated ? What is the alternative then ? Is it Spark because Map-Reduce has slow performance as compared to Spark ?
  2. I checked, Mahout provide support with Spark and Flink as well. Mahout spark provides 2 types of filtering -> spark item similarity and spark row similarity. But i haven't found any java based example to create recommendation in java.
  3. Might be Mahout Spark has better compatibility with Scala but can we write recommendation engine based on spark item similarity and spark row similarity in Java? Suggest some example as well.
  4. Mahout Spark can run standalone without Hadoop ? So far i know Spark is an alternative of Hadoop where we can do real time processing. What all libraries i need to add except mahout-spark_2.10-0.13.0.jar and mahout-spark_2.10-0.13.0-dependency-reduced.jar?
  5. Mahout spark is different than standalone Apache Spark? I am thinking to go with standalone Apache Spark as well.

Can someone please clarify me.


Solution

  • 1) Map Reduce was deprecated completely in 0.10.0. The 'new Mahout' is a mathematically expressive Scala DSL that is abstracted away from the engine- e.g. The same Scala code should be able to compile for Flink/Spark/Other Engines. Yes this was based on performance.

    2) There hasn't been a lot of work done with the Java API, however I've heard there are some people working on it.

    3.) I think you're asking if you could write a Spark recommendation engine in Java. The answer is yes. But really, I mean, I haven't done a lot of porting between scala / Java, but in theory you should be able to just import the Scala functions/classes into your Java code? This Link shows a little more about writing a reccomender from scratch- though it is in Scala, you'd need to port it to Java (if you do that, feel free to open a PR and we'll include it as an example).

    4.) Yes, it can. This Link described how to set up Spark with Mahout in Zeppelin, but the principals remain the same for any setup (e.g. which jars you need and what SparkConf you need to tweak) iirc, you need mahout-spark, mahout-math, mahout-math-scala. (The spark-dependency-reduced, you only need for using local shell programs, e.g. Zeppelin or the Mahout Spark Shell).

    5.) Yes, Mahout is a library that runs on Spark or other distributed engines.