scalahadoopmapreducejvm-languages

what are the options for hadoop on scala


We are starting a big-data based analytic project and we are considering to adopt scala (typesafe stack). I would like to know the various scala API's/projects which are available to do hadoop , map reduce programs.


Solution

  • Definitely check out Scalding. Speaking as a user and occasional contributor, I've found it to be a very useful tool. The Scalding API is also meant to be very compatible with the standard Scala collections API. Just as you can call flatMap, map, or groupBy on normal collections, you can do the same on scalding Pipes, which you can imagine as a distributed List of tuples. There's also a typed version of the API which provides stronger type-safety guarantees. I haven't used Scoobi, but the API seems similar to what they have.

    Additionally, there are a few other benefits: