UPDATE:
the actual problem is different from what I've described. I'll provide and update/edit to this ticket once we'll resolve the issue. More details may be found at this thread - https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Reliably-trigger-alerts-for-Log-Analytics-log-entries/m-p/319315/highlight/false#M1224
Original question:
We use Azure Monitor
to create alerts based on logs in Log Analytics
. For this we choose our Log Analytics account as a "RESOURCE", then choose "Custom log search" signal name for "CONDITION". Alert logic - "Number of results greater than 0".
Sample query:
search *
| where ResourceProvider == "MICROSOFT.DATAFACTORY" and status_s == "Failed"
For Period
and Frequency
lets set 15
minutes. All looks simple, but...
The issue: described above setup does not work (it works sometimes), because alerts are fired only sometimes, a lot of them are missed which is completely unacceptable behavior.
If we set Period = Frequency = 5
minutes we basically miss almost every event. Period = Frequency = 15
minutes works better, but still a lot of events are missing. Period = Frequency = 30
works even better, but all this looks weird.
Important notice - logs are collected from Data Factory V2
into Log Analytics
. I suspect that alert misses are due to the fact that logs are delivered to Log Analytics
with some delay (up to several minutes). So when Azure Monitor
evaluates alert query for the last 15
minutes (Period=15
) it might be that most resent log entries are still not in Log Analytics
. When next query evaluation is executed in 15
minutes it will miss the logs that were ingressed with a delay for prev 15
minutes interval. Is this assumption correct? If so, this is very weird - how then we supposed to configure Period
and Frequency
values? If I set Period > Frequency
(e.g. Period = 30
and Frequency = 5,
which means "evaluate expression every 5 minutes, take data for last 30 minutes from current time") then we get multiple duplicated alerts because Period
is larger than Frequency
so there is a big chance of log search query returning the same log entries every 5
minutes - this is highly undesirable behavior.
Issue happened to be with a buggy bahavior of ARM template creating alerts. Thanks to Stanislav Zhelyazkov it has been nailed down and resolved - I use alternative ARM API now and it seems to work fine. More details on the topic may be found here - https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Reliably-trigger-alerts-for-Log-Analytics-log-entries/m-p/309610.