apache-flink

Flink webUI - GC time


In flink web ui, under taskmanager--> Advanced section. Garbage collection details are given (i have attached the image and highlighted the options in red).

I assume Garbage collection time is in milliseconds. But i couldn't find it in flink documentation. Can anyone help me with this, is my assumption correct?

I am using docker image "flink:1.18.1-java8" and it is running in kubernetes.I tried running jcmd and jstat command to monitor taskmanager memory, it says it is not available when i checked the opt/java/bin, these tools were not there.Now, How can i monitor JVM heap usage?

enter image description here

enter image description here

Thanks in advance!


Solution

  • Flink publishes a wide variety of metrics for any running jobs that can provide all sorts of information related to your jobs related to resources, throughput, garbage collection, checkpointing, buffers, and much more. I'd highly recommend taking advantage of these using something like Prometheus and/or Grafana to increase your visibility into your Flink jobs.

    They are built-in and typically only require you to expose them on the appropriate port, which can either be consumed (via a ServiceMonitor) or accessed directly through the exposed port for your job.

    They should contain all of the information that you are looking for and much, much more.

    I assume Garbage collection time is in milliseconds. But i couldn't find it in flink documentation. Can anyone help me with this, is my assumption correct?

    The Garbage Collection metrics expose GC time as milliseconds per the related metric:

    How can i monitor JVM heap usage?

    Similiarly the Memory metrics expose all varieties of memory-related metrics for access heap, non-heap, direct, and mapped memory values at the TaskManager-level. The exposure can vary depending on the version of the JVM that you are targeting, but you should find everything that you need exposed there.