apache-sparkprometheus

How to monitor Apache Spark with Prometheus?


I have read that Spark does not have Prometheus as one of the pre-packaged sinks. So I found this post on how to monitor Apache Spark with prometheus.

But I found it difficult to understand and to success because I am beginner and this is a first time to work with Apache Spark.

First thing that I do not get is what I need to do?

I do not get what are the steps to make it...

The thing that I am making is: changing the properties like in the link, write this command:

--conf spark.metrics.conf=<path_to_the_file>/metrics.properties

And what else I need to do to see metrics from Apache spark?

Also I found this links: Monitoring Apache Spark with Prometheus

https://argus-sec.com/monitoring-spark-prometheus/

But I could not make it with it too...

I have read that there is a way to get metrics from Graphite and then to export them to Prometheus but I could not found some useful doc.


Solution

  • There are few ways to monitoring Apache Spark with Prometheus.

    One of the way is by JmxSink + jmx-exporter

    Preparations

    Use it in spark-shell or spark-submit

    In the following command, the jmx_prometheus_javaagent-0.3.1.jar file and the spark.yml are downloaded in previous steps. It might need be changed accordingly.

    bin/spark-shell --conf "spark.driver.extraJavaOptions=-javaagent:jmx_prometheus_javaagent-0.3.1.jar=8080:spark.yml" 
    

    Access it

    After running, we can access with localhost:8080/metrics

    Next

    It can then configure prometheus to scrape the metrics from jmx-exporter.

    NOTE: We have to handle to discovery part properly if it's running in a cluster environment.