scalaapache-sparkelasticsearchelasticsearch-hadoopelasticsearch-spark

Retrieve metrics from elasticsearch-spark


At the end of an ETL Cascading job, I am extracting metrics about the Elasticsearch ingestion using Hadoop metrics that elasticsearch-hadoop exposes using Hadoop counters.

I want to do the same using Spark, but I don't find documentation related to metrics using the Spark connector.

No always, but usually, we are going to execute the Job in EMR (Hadoop) so maybe the Spark connector is using Hadoop in the same way that with the Cascading connector. Anyway, I think that it isn't the case because I think that it is only for "MapReduce connector types" like Cascading.

So my questions are:

  1. How to extract metric from the Elasticsearch Spark connector?
  2. If the connector is using Hadoop Counters, how to access to Hadoop Counters from Spark when I'm executing it in Hadoop Yarn?

Versions:


Solution

  • Basically, it is not implemented. There is a ticket on Spark, opened on 01/Apr/16 and still open without any activity, so nobody takes care of it.

    Also a discussion in the ES forum, but no workaround.