apache-sparkkubernetestime-seriesmonitoringgraphite

Can Spark get monitoring metrics while running and push it to time series database?


We run Spark over Kubernetes and we spin up a Spark driver and executors for a lot of our tasks (not a spark task). After the task is finished we spin the cluster (on Kubernetes) down and spin up another one when needed (There could be a lot running simultaneously).

So pull monitoring is not possible. Is there a way of pushing the executor's metrics through the spark driver and not getting them from the API?


Solution

  • It is possible to do it with one of the built-in sinks or by creating your own.

    For example, you can use the GraphiteSink to push metrics to Graphite or the Slf4jSink to send to StatsD. You can also use other platforms that can work with the same protocols. For example: Elastic Search with the Graphite metricbeat to work with the GraphiteSink.

    For the full list of built-in sinks.