azure-log-analyticsazure-monitoringazure-monitorazure-monitor-workbooks

LogAnalytic Alert Rule Suppression option


We have 1000+ azure log analytic workspace alert rules created with AzureRM templates and azurepiplines for our different projects. below is the ARM template used for all these alert rule creations and we have a parameter called "enabled" for enabling and disabling the alerts as needed.

But for further enhance this automation, we are trying to automate a way for our developers, so that they can suppress any specific alert rules triggering for a specific time from they addressed the issue to till the time they are resolving the issue by enhancing the same arm templates & pipeline approach itself. Also once the suppression time is over, the alerts should be in the enabled status and should trigger for the events.

{
    "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "actionGroupName": {
            "type": "string"
        },           
        "query": {
            "type": "string"
        },
        "logAnalyticsWorkspaceId": {
            "type": "string"
        },
        "AlertRuleName": {
            "type": "string"
        },    
        "tags": {
            "type": "object"
        },
        "schedule": {
            "type": "object"
        },       
        "severity": {
            "type": "int"           
        },                          
        "operator": {
            "type": "string"         
        },  
        "threshold": {
            "type": "int"
        },
        "autoMitigate": {
            "type": "string",
            "defaultValue": false
        },
        "enabled": {
            "type": "string"
        },        
        "customWebhookPayload": {
            "type": "object"
        },             
        "location": {
            "defaultValue": "[resourceGroup().location]",
            "type": "string"
        }    
},    
    "resources":[
        {
            "type":"Microsoft.Insights/scheduledQueryRules",
            "name": "[parameters('AlertRuleName')]",
            "apiVersion": "2018-04-16",
            "location": "[parameters('location')]",
            "tags": "[parameters('tags')]",            
            "properties":{
                "displayName": "[parameters('AlertRuleName')]",
                "description": "[parameters('AlertRuleName')]",
                "enabled": "[parameters('enabled')]",
                "source": {
                    "query": "[parameters('query')]",
                    "dataSourceId": "[parameters('logAnalyticsWorkspaceId')]",
                    "queryType":"ResultCount"
                },
                "schedule":"[parameters('schedule')]",
                "action":{
                    "odata.type": "Microsoft.WindowsAzure.Management.Monitoring.Alerts.Models.Microsoft.AppInsights.Nexus.DataContracts.Resources.ScheduledQueryRules.AlertingAction",
                    "severity": "[parameters('severity')]",
                    "aznsAction":{
                        "customWebhookPayload": "{ \"AlertRuleName\":\"#alertrulename\", \"AlertType\":\"#alerttype\", \"Severity\":\"#severity\", \"Application\":\"#{appname}#\", \"Text\":\"#alertrulename fired with #searchresultcount records. #{alertDescription}#\", \"SearchQuery\":\"#searchquery\" }",
                        "actionGroup": "[array(parameters('actionGroupName'))]"
                    },
                    "trigger":{
                        "thresholdOperator": "[parameters('operator')]",
                        "threshold": "[parameters('threshold')]"
                    }
                }
            }
        }
    ]
}

Solution

  • Here is a PowerShell script to disable the alert until the maintenance period is over and automatically re-enable it once the maintenance is complete.

    $resourceGroupName = "Automation_RG"
    $alertRuleNames = @("Sample-Alert")  
    
    function Disable-AlertRules {
        param (
            [string[]]$AlertRules
        )
        foreach ($alertRuleName in $AlertRules) {
            Write-Host "Disabling alert rule: $alertRuleName"
            Update-AzActivityLogAlert -ResourceGroupName $resourceGroupName -Name $alertRuleName -Enabled $false
        }
    }
    function Enable-AlertRules {
        param (
            [string[]]$AlertRules
        )
        foreach ($alertRuleName in $AlertRules) {
            Write-Host "Enabling alert rule: $alertRuleName"
            Update-AzActivityLogAlert -ResourceGroupName $resourceGroupName -Name $alertRuleName -Enabled $true
        }
    }
    Disable-AlertRules -AlertRules $alertRuleNames
    Start-Sleep -Seconds 60
    Enable-AlertRules -AlertRules $alertRuleNames
    
    Write-Host "Maintenance completed. All specified alert rules have been re-enabled."
    

    The script will automatically disable the alert and re-enable it once the maintenance period is completed. You need to specify the maintenance time using the Sleep command.

    Start-Sleep -Seconds 60
    

    enter image description here

    Bash script

    #!/bin/bash
    resourceGroupName="Automation_RG"
    alertRuleNames=("Sample-Alert")
    
    # Function to disable alert 
    disable_alert_rules() {
        for alertRuleName in "${alertRuleNames[@]}"; do
            echo "Disabling alert rule: $alertRuleName"
            az monitor activity-log alert update -g "$resourceGroupName" -n "$alertRuleName" --enable false
        done
    }
    
    # Function to enable alert 
    enable_alert_rules() {
        for alertRuleName in "${alertRuleNames[@]}"; do
            echo "Enabling alert rule: $alertRuleName"
            az monitor activity-log alert update -g "$resourceGroupName" -n "$alertRuleName" --enabled true
        done
    }
    disable_alert_rules
    # Wait for the maintanance period (in seconds)
    sleep 60
    enable_alert_rules
    echo "Maintenance completed. All specified alert rules have been re-enabled."
    
    

    Output:

    enter image description here