javaapache-sparkjmxspark-streamingcodahale-metrics

Spark streaming custom metrics


I'm working on a Spark Streaming program which retrieves a Kafka stream, does very basic transformation on the stream and then inserts the data to a DB (voltdb if it's relevant). I'm trying to measure the rate in which I insert rows to the DB. I think metrics can be useful (using JMX). However I can't find how to add custom metrics to Spark. I've looked at Spark's source code and also found this thread however it doesn't work for me. I also enabled the JMX sink in the conf.metrics file. What's not working is I don't see my custom metrics with JConsole.

Could someone explain how to add custom metrics (preferably via JMX) to spark streaming? Or alternatively how to measure my insertion rate to my DB (specifically VoltDB)? I'm using spark with Java 8.


Solution

  • Ok after digging through the source code I found how to add my own custom metrics. It requires 3 things:

    1. Create my own custom source. Sort of like this
    2. Enable the Jmx sink in the spark metrics.properties file. The specific line I used is: *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink which enable JmxSink for all instances
    3. Register my custom source in the SparkEnv metrics system. An example of how to do can be seen here - I actually viewed this link before but missed the registration part which prevented me from actually seeing my custom metrics in the JVisualVM

    I'm still struggling with how to actually count the number of insertions into VoltDB because the code runs on the executors but that's a subject for a different topic :)

    I hope this will help others