[SOLVED] Difference between hive, impala and beeline

Difference between hive, impala and beeline

I am new to Hadoop eco-system tools. Can anyone help me with understand the difference between hive, beeline and hive.

Thanks in advance!

Solution

Apache Hive :

1] Apache Hive is a data warehouse infrastructure build over Hadoop platform for performing data intensive task such as querying, analysis, processing and visualization.
2] Hive generates query expression at compile time.
3] Every Hive query has this problem of "cold start"
4] Hive translates queries to be executed into MapReduce jobs under the hood involving overheads.
5] Hive is more universal, versatile and pluggable language.
6] For an upgradation project where compatibility and speed are equally imprtant. Hive is an ideal choice.

Cloudera Impala :

1] Impala is an excellent choice for programmers for running queries on HDFS and Apache HBase as it doesn't require data to be moved or transformed.
2] Impala does runtime code generation for "big loops" using llvm.
3] Impala avoids startup overhead as daemon processes are started at boot time itself, always being ready to process a query.
4] Impala resonds quickly through massively parallel processing.
5] Impala is used unleash its brute processing power and give lightning fast analytic result.
6] Impala is an ideal choice when starting a new project.

Beeline :

1] Hive CLI connects directly to the Hive Driver and requires that Hive be installed on the same machine as the client.
2] However, Beeline connects to HiveServer2 and does not require the installation of Hive libraries on the same machine as the client.
3] Beeline is a thin client that also uses the Hive JDBC driver but instead executes queries through HiveServer2, which allows multiple concurrent client connections and supports authentication.
4] Cloudera's Sentry security is working through HiveServer2 and not HiveServer1 which is used by Hive CLI. So hive though the command-line will not follow the policy from Setry. According to the cloudera docs you should not use Hive CLI and WebHCat. Use beeline or impala-sell instead.
5] Connect with Beeline : url is a jdbc connection string, pointing to the hiveServer2 host.
terminal> beeline -u url -n username -p password
OR terminal> beeline
beeline> !connect jdbc:hive2://HiveServer2Host:Port