google-cloud-platformmonitoringstackdrivergoogle-cloud-monitoring

gcp monitoring "Any time series violates" vs "All time series violate"


enter image description here

What's the difference between the two options "Any time series violates" and "All time series violate"? I can imagine what would the former one do easily, but I have no idea what would the latter one do.

All time series? how long is its range? and why does it have a for option?


Solution

  • What's the difference between the two options "Any time series violates" and "All time series violate"? I can imagine what would the former one do easily, but I have no idea what would the latter one do.

    First, what is "time series violates" - its when CURRENT VALUE of metric is outside of expected range, e.g: above the threshold specified.

    Second, "any/all/percent/number" - let's say you have 5 time series, e.g.: cpu usage on 5 instances, then per dropdown options the whole alert condition will violate when:

    Third, for aka Duration box, - it looks like "if my time series violates FOR 5 minutes, then violate the condition". And for some simpler alerts this can even work, but once you try to combine it with say, "metric is absent" or other complicated configuration, you will see that what actually happens is "wait for 5 minutes after the problem is there, and only then trigger the violation".

    In practice, the use of for field is discouraged and its better to keep it on default "Most recent value".

    If you do need the "cpu usage is above 90% for 5 minutes", then correct way of doing it is by denoizing/smoothing your data: