apache-flinkflink-streamingprometheus-pushgateway

Why Flink uses the Pushgateway instead of Prometheus's usual pull model for general metrics collection?


We can see Flink uses the Pushgateway instead of Prometheus's usual pull model for general metrics collection when exposing Flink Metrics to an external system such as Prometheus.

@Override
public void report() {
    try {
        pushGateway.push(CollectorRegistry.defaultRegistry, jobName);
    } catch (Exception e) {
        log.warn("Failed to push metrics to PushGateway with jobName {}.", jobName, e);
    }
}

https://github.com/apache/flink/blob/master/flink-metrics/flink-metrics-prometheus/src/main/java/org/apache/flink/metrics/prometheus/PrometheusPushGatewayReporter.java

however from the Prometheus's official document below it states that "Prometheus scrapes metrics from instrumented jobs, either directly or via an intermediary push gateway for short-lived jobs" , obviously Flink Streaming job is not short-lived jobs, so why Flink uses the Pushgateway instead of Prometheus's usual pull model for general metrics collection?

https://prometheus.io/docs/introduction/overview/


Solution

  • Flink offers both the PrometheusPushGatewayReporter and the generally more appropriate pull-based PrometheusReporter. Prometheus has become quite popular with Flink users, and there was interest in the community in supporting both types of connection.