prometheuspromql

Use max by in PromQL query while keeping a label that is not part of the "by"


I'm trying to join two metrics (build_status and build_timestamp) and then filter out duplicates based on a specific labels (service and timestamp).
This is what I came up with:

max by (service, timestamp) (
  build_status{stage="$stage", envId="$env"} * on(stage, env, service, status) 
    group_left(timestamp) build_timestamp{stage="$stage", env="$env"}
)

This works well, except that the status label is then lost, and I can bring it back if I add it to the max by (i.e.: max by (service, timestamp, status)) but then I get duplicates, for example:

service1 timestamp1 status1
service1 timestamp1 status2

How do I max by the service and timestamp but keep the status label?
Thanks!


Solution

  • From Prometheus' original documentation:

    topk and bottomk are different from other aggregators in that a subset of the input samples, including the original labels, are returned in the result vector. by and without are only used to bucket the input vector.

    So you can use topk for your case:

     topk by(service, timestamp) (1,
      build_status{stage="$stage", envId="$env"} * on(stage, env, service, status) 
        group_left(timestamp) build_timestamp{stage="$stage", env="$env"}
    )
    

    More info