hadoopelasticsearchhbaseelasticsearch-hadoop

What is ElasticSearch-Hadoop (es-hadoop) and its benefit over HBase for a live web application?


It is not entirely clear to me what es-hadoop is from the description.

Is this merely a "connector" that will move data over from your ES cluster to HDFS for Hadoop analytics? If so, why not just go with HBase for low-latency text queries?

Is es-Hadoop a different installation than regular ES?

Some clarification please.

Thanks.


Solution

  • ES-Hadoop is way closer to be a connector between Hadoop eco-system to ES. It is not a separate release of ES.

    Basically it improves the integration between Hadoop eco-system application to ES. In my organisation we use this feature for 2 purposes:

    1. Before indexing the data to ES, we use Spark to analyse the data and perform relevant aggregation to reduce the amount of indexing that should be performed on ES. ES-Hadoop help us to index directly from Spark data structures to ES. We are starting the indexing process with a single line of code and don't need to write the indexing program ourselves. (The feature is configurable, and you have the flexibility to index the data however you like).

    2. In our organisation we use ES as our near real time analytics cluster. The data in ES is placed in a way that will produce the best performance for our clients. Sometimes (usually when we have ideas about some new features) we have to get the data from ES and perform some complex processing on the data. In those cases we can create Spark data structure from ES data in a single line of code as well.

    So, ES-Hadoop is closer to be a well written connector. You still have to transport the data from your ES cluster to Hadoop.

    I'm not sure about the comparison to HBase, You can't really compare the features of HBase which is a key value store in compare to ES which is a general purpose search engine + implemented very nice analytics capabilities in the last versions. As I see it, we are dealing with different tools that access different set of problems.