Is Apache Mahout (https://mahout.apache.org/users/recommender/intro-itembased-hadoop.html) available on Google Dataproc?
Google Cloud Dataproc does not bundle Apache Mahout by default, but it is usable with Dataproc in a couple different ways.
You can bundle it into your jar (using a Maven shade or assembly plugin or the equivalent in your build tool of choice), and run it as a regular Hadoop MapReduce or Spark job.
Mahout 0.11.0 is available as an Apache Bigtop package inside of Dataproc. If you run:
sudo apt-get update
sudo apt-get install mahout -y
on the master node either after SSHing or in an initialization action, you should have the 'mahout' command with proper classpath.
Mahout 0.11.0 only supports Spark 1.3, but Dataproc (1.0) ships with Spark 1.6.1. You could download or bundle Mahout 0.12.0. which came out last week, but even that only claims to support Spark 1.5. When there is a better solution for Spark compatibility, we will create a Mahout initialization action at https://github.com/GoogleCloudPlatform/dataproc-initialization-actions.