I seem to have hit a problem in which Spark writing to Elasticsearch is very slow and it takes quite a lot of time (around 15 mins) in making the initial connection, during which both Spark and Elasticsearch remain idle. There is another thread highlighting the same issue in the elastic community but it has been closed without any solution.
This is how I am writing from Spark to ES:
vgDF.write.format("org.elasticsearch.spark.sql").mode('append').option("es.resource", "demoindex/type1").option("es.nodes", "*ES IP*").save()
Spark specifications
Spark 2.1.0
3 cpu x 10 gb ram x 6 executors
running on 3 gce nodesSpark 2.1.0
Elasticsearch specifications:
8 cpu * 30 gb RAM single node
ES Versions:
Elasticsearch: 6.2.2
ES-Hadoop: 6.2.2
For your information, Spark reads data from Cassandra DB, process the results (but this process is quite fast, takes around 1 - 2 mins) and then writes to Elasticsearch.
Any help would be greatly appreciated
[EDIT]
I have also tried varying the size of data from millions of records to just 960 records, but the initial delay is still the same (approx 15 mins).
Looks like ES connection is timing out. check if ES is accessible on the ip address you are providing. if you are using public IP, try changing it to private IP