I have created an external table in hive and need to move the data to ES (of 2 nodes, each with 1 TB). Below regular query taking very long time (more than 6 hours) for a source table with 9GB of data.
INSERT INTO TABLE <ES_DB>.<EXTERNAL_TABLE_FOR_ES>
SELECT COL1, COL2, COL3..., COL10
FROM <HIVE_DB>.<HIVE_TABLE>;
ES index is having default 5 shards and 1 replica. Increasing the number of shards could any way speed up the ingestion? Could some one suggest any improvements to speed up the ES node ingestion.
You don't mention the methodology you're using to feed the data into ES so it's hard to see if you're using an ingestion pipeline or what technology to bridge the gap. Given that, I'll stick with generic advice on how to optimize ingestion into Elasticsearch.
Elastic has published some guidance for optimizing systems for ingestion, and there are three points that we've found do make a real difference:
Finally, have you installed Kibana and monitored your nodes to know what they are limited by? In particular CPU or Memory?