azureazure-web-app-serviceautoscalingazure-app-service-plans

Azure App Service Autoscale Fails to Scale In


My app service has failed to scale-in after scaling-out. This seems to be a pattern I've been trying to troubleshoot for several months.

I've tried the following but none have worked:

My scale condition is based on CPU and memory. However, I've never seen CPU go past 12%, so I'm assuming it's actually scaling based on memory.

  1. Set the scale out condition to memory over 90% over a 5 minute average with 10 min. cooldown and scale in condition for memory under 70% over a 5 minute average. This doesn't seem to make sense since if my memory utilization is already at 90%, I'm really having underlying memory leaks and should have already scaled out.

  2. Set the scale out condition to memory over 80% over a 60 minute average with 10 min. cooldown and scale in condition for memory under 60% over a 5 minute average. This makes more sense, as I've seen memory usage burst over a few hours only to drop.

enter image description here

Expected behavior: App service autoscaling will reduce instance count after 5 minutes where memory usage drops below 60%.

Question:

What is the ideal threshold on a metric to scale smoothly by if my baseline CPU remains roughly at an average of 6% and memory at 53%? Meaning, what is the best minimum values to scale in by and best max values to scale out without worrying about anti-patterns such as flapping? A larger threshold of 20% difference makes more sense to me.

Alternative solution:

Given the amount of troubleshooting involved with what's marketed as as simple as "push button scaling", makes it almost not even worth the headache of the configuration vagueness (you can't even use IIS metrics like connection count without a custom powershell script!). I'm considering disabling autoscaling because of its unpredictability and just keep 2 instances running for automatic load balancing and scale manually.

Autoscale Configuration:

{
    "location": "East US 2",
    "tags": {
        "$type": "Microsoft.WindowsAzure.Management.Common.Storage.CasePreservedDictionary, Microsoft.WindowsAzure.Management.Common.Storage"
    },
    "properties": {
        "name": "CPU and Memory Autoscale",
        "enabled": true,
        "targetResourceUri": "/redacted",
        "profiles": [
            {
                "name": "Auto created scale condition",
                "capacity": {
                    "minimum": "1",
                    "maximum": "10",
                    "default": "1"
                },
                "rules": [
                    {
                        "scaleAction": {
                            "direction": "Increase",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT10M"
                        },
                        "metricTrigger": {
                            "metricName": "MemoryPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "GreaterThanOrEqual",
                            "statistic": "Average",
                            "threshold": 80,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT1H"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Decrease",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "MemoryPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "LessThanOrEqual",
                            "statistic": "Average",
                            "threshold": 60,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT10M"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Increase",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "CpuPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "GreaterThanOrEqual",
                            "statistic": "Average",
                            "threshold": 60,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT1H"
                        }
                    },
                    {
                        "scaleAction": {
                            "direction": "Decrease",
                            "type": "ChangeCount",
                            "value": "1",
                            "cooldown": "PT5M"
                        },
                        "metricTrigger": {
                            "metricName": "CpuPercentage",
                            "metricNamespace": "",
                            "metricResourceUri": "/redacted",
                            "operator": "LessThanOrEqual",
                            "statistic": "Average",
                            "threshold": 40,
                            "timeAggregation": "Average",
                            "timeGrain": "PT1M",
                            "timeWindow": "PT10M"
                        }
                    }
                ]
            }
        ],
        "notifications": [
            {
                "operation": "Scale",
                "email": {
                    "sendToSubscriptionAdministrator": false,
                    "sendToSubscriptionCoAdministrators": false,
                    "customEmails": [
                        "redacted"
                    ]
                },
                "webhooks": []
            }
        ],
        "targetResourceLocation": "East US 2"
    },
    "id": "/redacted",
    "name": "CPU and Memory Autoscale",
    "type": "Microsoft.Insights/autoscaleSettings"
}

Solution

  • I have exactly the same problem and I've come to believe that autoscaling back to one instance like we want to do it currently not possible.

    My current workaround is to scale in to 1 instance with a second profile that repeats every day between 23:55 and 00:00.

    Just to reiterate the problem. I have the following scenario. It is basically identical to yours.

    Scaling out from 1 instance to 2 instances will work correctly when the average memory percentage exceeds 80%. But scaling in to 1 instance will never work because the memory baseline is too high.

    After reading the Best Practices, my understanding is that when scaling in, it will estimate the resulting memory percentage and check if no scale out rule is triggered.

    So if the average memory percentage drops to 50% for two instances the scale in rule is triggered and it will estimate the resulting memory usage to be 2 * 50% / 1 = 100% which will of course trigger the scale out rule and thus it will not scale in.

    It should however work when scaling from 3 to 2 instances: 3 * 50% / 2 = 75% which is smaller than the 80% of the scale out rule.