I want to build a logic app that tells me when any of the logic apps in my environment fail or a connection is broken. (both manual triggers of logic apps and automated)
The closest KQL i've found to achieve this is:
SentinelHealth
| where TimeGenerated > ago(7d)
| where SentinelResourceType == "Automation rule"
| mv-expand TriggeredPlaybooks = ExtendedProperties.TriggeredPlaybooks
| extend runId = tostring(TriggeredPlaybooks.RunId)
| join (AzureDiagnostics
| where OperationName == "Microsoft.Logic/workflows/workflowRunCompleted"
| extend IncidentNumber = toint(extract(@"[a-f0-9]{8}\-[a-f0-9]{4}\-[a-f0-9]{4}\-[a-f0-9]{4}\-[a-f0-9]{12}\_(\d+)", 1, correlation_clientTrackingId_s))
| project
IncidentNumber,
resource_runId_s,
playbookName = resource_workflowName_s,
playbookRunStatus = status_s)
on $left.runId == $right.resource_runId_s
| project
RecordId,
TimeGenerated,
AutomationRuleName= SentinelResourceName,
AutomationRuleStatus = Status,
Description,
workflowRunId = runId,
playbookName,
playbookRunStatus,
IncidentNumber
This is my workflow right now: Query runs and outputs to a html table which gets emailed to me.
The issue is I know there are several logic apps in my environment that have failing actions or actions that are disconnected, that are not captured by my logic app + kql.
How would you tackle the problem of detecting and notifying when a logic app fails (both completely and when a particular action fails)
If there are logic applications which are not appearing in your logs, make sure that you are sending their diagnostics settings to your Log Analytics workspace:
Go to Logic apps > Monitoring - Diagnostics > Diagnostic Settings
:
Ensure that you are forwarding settings to the correct Log Analytics workspace:
Then you should be able to query the logs using this basic KQL:
AzureDiagnostics
| where OperationName endswith "workflowRunCompleted"
| summarize FailedRuns=countif(status_s == "Failed"), SuccessfulRuns=countif(status_s == "Succeeded") by LogicApp=resource_workflowName_s, ResourceGroup=resource_resourceGroupName_s
| extend PercentageFailed = round(todouble(FailedRuns) / todouble(SuccessfulRuns)*100, 2)
You can adjust this as per your needs. Change line 2 to | where OperationName contains "Microsoft.Logic/workflows
for all logic app events being collected.
Another reason the logic apps may not be appearing is if they have not been run in the Timeframe you are querying.