apache-sparkspark-structured-streaming

What do these metrics mean in Spark Structured Streaming?


spark.streams.addListener(new StreamingQueryListener() {
    ......
    override def onQueryProgress(queryProgress: QueryProgressEvent): Unit = {
        println("Query made progress: " + queryProgress.progress)
    }
    ......
})

When StreamingQueryListener is added to Spark Structured Streaming session and output the queryProgress continuously, one of the metrics you will get is durationMs:

Query made progress: {
  ......
  "durationMs" : {
    "addBatch" : 159136,
    "getBatch" : 0,
    "getEndOffset" : 0,
    "queryPlanning" : 38,
    "setOffsetRange" : 14,
    "triggerExecution" : 159518,
    "walCommit" : 182
  }
  ......
}​

Can anyone tell me what those sub-metrics in durationMs mean? For example, what is the meaning of "addBatch: 159136".


Solution

  • https://www.waitingforcode.com/apache-spark-structured-streaming/query-metrics-apache-spark-structured-streaming/read

    This is an excellent site that addresses the aspects and more, passing the credit to this site therefore.