As a developer, I've created HBase table for our project by importing data from existing MySQL table using sqoop job
. The problem is our data analyst team are familiar with MySQL syntax, implies they can query HIVE
table easily. For them, I need to expose HBase table in HIVE. I don't want to duplicate data by populating data again in HIVE. Also, duplicating data might have consistency issues in future.
Can I expose HBase table in HIVE without duplicating data? If yes, how do I do it? Also, if I insert/update/delete
data in my HBase table will updated data appear in HIVE without any issues?
Sometimes, our data analytic team create table and populate data in HIVE. Can I expose them to HBase? If yes, how?
HBase-Hive Integration:
Creating an external table
in hive for HBase table allows you to query HBase data o be queried in Hive without the need for duplicating data. You can just update or delete data from HBase table and you can view the modified table in Hive too.
Example:
Consider you have an hbase table with columns id
, name
and email
.
Sample external table command for hive:
CREATE EXTERNAL TABLE hivehbasetable(key INT, id INT, username STRING, password STRING, email STRING) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,id:id,name:username,name:password,email:email") TBLPROPERTIES("hbase.table.name" = "hbasetable");
For more information on Hive-Hbase integration look here