apache-sparkhiveimpaladelta-lakeapache-kudu

Impala Delta Lake Integration


I have set up Delta Lake in Cloudera. It works fine with Spark and Hive.

I have searched enough on the internet to integrate Delta Lake with Impala.

I did not find much information.

Can someone please answer if you have done the same?

Update:

Do not need Impala to delete from/update the Delta tables. Impala will be used to only query/select data from Delta (built on top of Parquet) tables.

Hope this can be done with good performance using Delta Hive connector?

Basically, Impala will be used for ad-hoc querying / dashboarding / BI, and if users need to update/delete, then it will be done on new tables created by the users (Kudu can be used here) and not on the original tables where select is done.

Hope this clarifies. Please suggest. Let me know if more Info. is required.


Solution

  • There is no direct integration. It would be delta hive connectors for integration, with impala sitting on top of hive.

    Not common as impala cannot delete from hive, only from kudu.

    Impala does not use tez or mr for Hive underneath.

    See https://impala.apache.org/docs/build3x/html/topics/impala_refresh.html