hivehanavora

Is it possible to load Hive data into Vora?


I checked the Developer guide but didn't find the answer. So far I was able to load CSV and ORC files from HDFS to Vora, but am able to load from Hive?

In absence of specifying Hive as a source, I tried to use "paths" equal to /apps/hive/warehouse/tablename/00000_0 (or whatever is the partial file name). However, if a Hive table is represented by multiple files in the /tablename/ directory, I'd have to explicitly list them in "paths", which is not an ideal option. Is there any better way?

Update: The context for this question is that while Vora doesn't provide data persistence, I'd like to use Hive warehouse as persistency layer, which is, ultimately, still files, but with some extra organization. Using Hadoop in SAP ecosystem, I could utilize SAP Data Services with Hive adapter to load files from outside into Hadoop (and dump data from Hadoop into files, if required), and make that data available via Vora.


Solution

  • There is no automatic way to load/migrate Hive tables into Vora. Creating Vora tables based on (Hive-organized) files in HDFS would be the way to go.

    The paths option allows to use a wildcard * to load all files from a particular directory in HDFS. This works for csv, parquet, orc. E.g. paths "/path_to_my_dir1/*,/path_to_my_dir2/*"