hadoopapache-sparkcassandragiraph

Giraph, Hadoop, Spark and Cassandra


Is it possible for me to use Giraph if I have Spark clusters and Cassandra but no Hadoop clusters?

Currently, I am using GraphX and would like to use Giraph instead. Is this possible considering that I have Spark clusters and am using Cassandra?


Solution

  • I have only limited experience with Giraph from years ago, and I never tried using it outside of a Hadoop cluster. But it looks like what you want is at least technically possible if not necessarily easy.

    This code is the companion to Practical Graph Analytics with Apache Giraph. As you can see, it requires Hadoop in the classpath for DoubleWritable and Text, for example, but it does nothing with a Hadoop cluster. Instead, it works with in-memory arrays. It looks like all you need to do is implement compute in the BasicComputation class to do whatever you need with Cassandra as long as you keep Hadoop around as a dependency to help satisfy the type boundaries for BasicComputation.

    I never found Giraph terribly intuitive, but hopefully you can make this unconventional setup work.