openstackautoscalingopenstack-heat

How to scale down the specific instance in AutoScalingGroup?


I am using Heat to implement auto-scaling, below is a short part of my code:

heat_template_version: 2016-10-14
...

resources:
    corey_server_group:
        type: OS::Heat::AutoScalingGroup
        depends_on: corey-server
        properties:
            min_size: 1
            max_size: 5
            resource:
                type: CoreyLBSServer.yaml
                properties:
......

CoreyLBSServer.yaml

heat_template_version: 2016-10-14
...

resources:
    server:
    type: OS::Nova::Server
    properties:
        flavor:
......

I am finding a way to scale down the specific instance, here are some I've tried but all of them didn't work, it always scales down the oldest one.

1.Shutdown the instance, then signal scaledown policy. (X)
2.According to this, find the stack-id from attribute refs_map, mark the resource server as unhealthy, then signal scaledown policy. (X)
3.Find the stack-id from attribute refs_map, set the stack status as FAILED, then signal scaledown policy. (X)

I tried to find out what strategy does AutoScalingGroup use while scaling down, from the code heat/common/grouputils.py, it sorts members by "created_time" then by name, so the oldest member will be deleted first when scaling down. But there is an exception, if include_failed is set, failed members will be put first in the list sorted by created_time then by name.

Update

I finally set my target as "failed" successfully, here is the command:

# firstly, print the physical_resource_id of corey_server_group
openstack stack resource show -c physical_resource_id <parent_stack_id> corey_server_group

# secondly, list resources in the corey_server_group
openstack stack resource list <physical_resource_id>

# thirdly, mark the target as unhealthy  
openstack stack resource mark unhealthy <physical_resource_id> <resource_name>

# after these commands, you will see the resource_status of the target becomes "Check Failed"

But it has another problem, Heat will delete both "failed" and "oldest" resource while scaling down! How to scale down only the "Marked as failed" target?


Solution

  • After few days of tracing, I finally find out a way to scale down the specific instance in the AutoScalingGroup.

    Let's take a glance at source code first: heat/common/grouputils.py#L114

    Sort the list of instances first by created_time then by name. If include_failed is set, failed members will be put first in the list sorted by created_time then by name.

    As you can see, include_failed is set to False by default, so unhealthy members won't be included in the list, that's why the procedure described in my question didn't work.

    If you want to enable the feature of scaling down the particular instance, you must explicitly define include_failed=True while calling functions, below is some part of my code:

    heat/engine/resources/aws/autoscaling/autoscaling_group.py enter image description here

    Cause I'm using AutoScalingGroup, I need to modify two files:
    heat/engine/resources/aws/autoscaling/autoscaling_group.py
    heat/engine/resources/openstack/heat/autoscaling_group.py

    Restart Heat services, then you can mark the target as unhealthy and signal the policy to scale down the specific instance:

    openstack stack resource mark unhealthy <physical_resource_id> <resource_name>
    openstack stack resource signal <parent_stack_id> your_scaledown_policy
    

    FYI, the table shows the different behavior between False and True (scaling_adjustment=1).

                       | include_failed=False (default) | include_failed=True  
                       |                                |
    Scale Up           | Add one instance               | Add one instance
                       |                                |
    Scale down         | Remove the oldest              | Remove the oldest
                       |                                |
    Stack Update       | Nothing changed                | Nothing changed
                       |                                |
    Unhealthy + Up     | Add one & remove unhealthy     | Add one & fix unhealthy
                       |                                |
    Unhealthy + Down   | Remove one & remove unhealthy  | Remove the unhealthy one
                       |                                |
    Unhealthy + Update | Fix unhealthy                  | Fix unhealthy