hiveimpalabeelinehivecli

Difference between hive, impala and beeline


I am new to Hadoop eco-system tools. Can anyone help me with understand the difference between hive, beeline and hive.

Thanks in advance!


Solution

  • Apache Hive :

    1] Apache Hive is a data warehouse infrastructure build over Hadoop platform for performing data intensive task such as querying, analysis, processing and visualization.
    2] Hive generates query expression at compile time.
    3] Every Hive query has this problem of "cold start"
    4] Hive translates queries to be executed into MapReduce jobs under the hood involving overheads.
    5] Hive is more universal, versatile and pluggable language.
    6] For an upgradation project where compatibility and speed are equally imprtant. Hive is an ideal choice.

    Cloudera Impala :

    1] Impala is an excellent choice for programmers for running queries on HDFS and Apache HBase as it doesn't require data to be moved or transformed.
    2] Impala does runtime code generation for "big loops" using llvm.
    3] Impala avoids startup overhead as daemon processes are started at boot time itself, always being ready to process a query.
    4] Impala resonds quickly through massively parallel processing.
    5] Impala is used unleash its brute processing power and give lightning fast analytic result.
    6] Impala is an ideal choice when starting a new project.

    Beeline :

    1] Hive CLI connects directly to the Hive Driver and requires that Hive be installed on the same machine as the client.
    2] However, Beeline connects to HiveServer2 and does not require the installation of Hive libraries on the same machine as the client.
    3] Beeline is a thin client that also uses the Hive JDBC driver but instead executes queries through HiveServer2, which allows multiple concurrent client connections and supports authentication.
    4] Cloudera's Sentry security is working through HiveServer2 and not HiveServer1 which is used by Hive CLI. So hive though the command-line will not follow the policy from Setry. According to the cloudera docs you should not use Hive CLI and WebHCat. Use beeline or impala-sell instead.
    5] Connect with Beeline : url is a jdbc connection string, pointing to the hiveServer2 host.
    terminal> beeline -u url -n username -p password
    OR terminal> beeline
    beeline> !connect jdbc:hive2://HiveServer2Host:Port