hadoopapache-sparkbigtop

Apache Bigtop Installation on RHEL 7


I'm seeking some help, I have been tasked with standing up a Hadoop cluster at work. I have done single node stuff on laptops at home with the open source stack (I am trying to stick with the open source, Apache stack to avoid any licensing costs. Right now we have no interest in Cloudera or HortonWorks.).

I came across the Apache BigTop stack (1.2.0) and poked around in there. Right now I am still trying to wrap my head around what this provides (I have not found a reference to Hadoop/Spark versions, etc..). Could I get some help on the following:

  1. What versions of Hadoop/Spark/other tools does the 1.2.0 version provide?

  2. Is there a good reference on installing a full Hadoop/Spark cluster from scratch under RHEL 7? I have 12 servers, I plan on doing 2 namenodes and 10 datanodes. Is BigTop appropriate for this, or should I just install each package and configure manually?

  3. I found the following:

https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+1.2.0

Which looks promising, but its for CentOS 7, which I know is similar, but not exactly the same. Can someone suggest how I can modify this to work under RHEL 7? I found repos, but none for RHEL....

  1. The documentation seems pretty slim on the official Apache page, or maybe I'm just not looking in the right spot... Are there good links to references out there for a full cluster install?

Thanks to all who can help, I really appreciate it!


Solution

  • What versions of Hadoop/Spark/other tools does the 1.2.0 version provide?

    Checkout our doc for 1.2.0 release:

    https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+1.2.0+Release

    You'll get hadoop 2.7.3 and spark 2.1.0 out-of-the-box. We've provided installable artifacts on S3 for you to test out the functionality

    https://www.apache.org/dist/bigtop/bigtop-1.2.0/repos/centos7/bigtop.repo

    NOTE: we'll have a S3 migration effectively on 10/15, 2017. We'll have corresponding changes afterwards. If you'd like to try it out ASAP. Please change the baseurl to:

    http://repos.bigtop.apache.org/releases/1.2.0/centos/7/x86_64

    Is there a good reference on installing a full Hadoop/Spark cluster from scratch under RHEL 7? I have 12 servers, I plan on doing 2 namenodes and 10 datanodes. Is BigTop appropriate for this, or should I just install each package and configure manually?

    RHEL and CentOS should be very much similar. I suggest:

    I found the following: https://cwiki.apache.org/confluence/display/BIGTOP/Bigtop+1.2.0+Release

    Yes. You're looking for the right doc. And this is exactly what I've mentioned above: though it's for CentOS 7, you can try the repo on RHEL 7.