heron

How to monitor the throughput of Heron Cluster


I needed to get the throughput of Heron Cluster for some reasons, but there is no metric in the Heron UI. So do you have any ideas about how to monitor the throughput of Heron Cluster? Thanks.

The result of running heron-explorer as follows:

yitian@heron01:~$ heron-explorer metrics aurora/yitian/devel SentenceWordCountTopology
[2018-08-03 21:02:09 +0000] [INFO]: Using tracker URL: http://127.0.0.1:8888
'spout' metrics:
container id           jvm-uptime-secs    jvm-process-cpu-load    jvm-memory-used-mb    emit-count    ack-count    fail-count
-------------------  -----------------  ----------------------  --------------------  ------------  -----------  ------------
container_3_spout_6               2053                0.253257                 146     1.13288e+07  1.13278e+07             0
container_4_spout_7               2091                0.150625                 137.5   1.1624e+07   1.16228e+07           231

'count' metrics:
container id            jvm-uptime-secs    jvm-process-cpu-load    jvm-memory-used-mb    emit-count    execute-count    ack-count    fail-count
--------------------  -----------------  ----------------------  --------------------  ------------  ---------------  -----------  ------------
container_6_count_12               2092                0.184742               155.167             0      4.6026e+07   4.6026e+07              0
container_5_count_9                2091                0.387867               146                 0      4.60069e+07  4.60069e+07             0
container_6_count_11               2092                0.184488               157.833             0      4.58158e+07  4.58158e+07             0
container_4_count_8                2091                0.443688               129.833             0      4.58722e+07  4.58722e+07             0
container_5_count_10               2091                0.382577               118.5               0      4.60091e+07  4.60091e+07             0

'split' metrics:
container id           jvm-uptime-secs    jvm-process-cpu-load    jvm-memory-used-mb    emit-count    execute-count    ack-count    fail-count
-------------------  -----------------  ----------------------  --------------------  ------------  ---------------  -----------  ------------
container_1_split_2               2091                0.143034               75.3333   4.59453e+07      4.59453e+06  4.59453e+06             0
container_3_split_5               2042                1.12248                79.1667   4.64862e+07      4.64862e+06  4.64862e+06             0
container_2_split_3               2150                0.139837               83.6667   4.59443e+07      4.59443e+06  4.59443e+06             0
container_1_split_1               2091                0.145702              104.167    4.59454e+07      4.59454e+06  4.59454e+06             0
container_2_split_4               2150                0.138453              106.333    4.59443e+07      4.59443e+06  4.59443e+06             0
[2018-08-03 21:02:09 +0000] [INFO]: Elapsed time: 0.031s.

Solution

  • You can use the execute-count of you sink component to measure the output of your topology. If each of your components have a 1:1 input:output ratio then this will be your throughput.

    However, if you are windowing tuples into batches or splitting tuples (like separating sentences into individual words) then things get a little more complicated. You can get the input into your topology by looking at the emit-count of your spout components. You could then use this in comparison to you bolt execute-counts to create your own throughput metric.

    An easy way to get programmatic access to these metrics is via the Heron Tracker REST API. You can use your chosen language's HTTP library (like Requests for Python) to query the last 3 hours of data for a running topology. If you require more than 3 hours of data (the maximum stored by the topology TMaster) you will need to use one of the other metrics sinks to send metrics to an external database. Heron currently provides sinks for saving to local files, Graphite or Prometheus. InfluxDB support is in the works.