When putting together a Fault Injection Service experiment in AWS, I want to include all running tasks that are part of an ECS service. Being able to target all tasks in a service is pretty straightforward, but things get weird if there are any tasks that have been stopped for any reason. That is because FIS will call the appropriate Describe
or List
API that AWS uses to find information about the particular resource(s) you are targeting. For ECS, that is the DescribeTasks API. And the DescribeTasks API (as well as other APIs that describe similar things) will report on stopped tasks/instances for up to an hour, which causes the FIS experiment to become really grumpy because there are now ECS tasks that are no longer registered with Session Manager.
I'm using terraform, and the documentation around using filters
isn't quite as clear as I'd like it to be. CloudFormation has an array for filters while Terraform appears to expect multiple filter blocks if checking different paths (it's not super clear and finding examples of using multiple filters is really difficult).
Given my current understanding of the documentation then, it sounds like my target
object should be defined within my experiment along something like this:
resource "aws_fis_experiment_template" "my_experiment" {
stop_condition {
source = "none"
}
action {
# action data here ...
}
target {
name = "my_targets"
resource_type = "aws:ecs:task"
selection_mode = "ALL"
resource_tag {
key = "aws:ecs:serviceName"
value = "my-service-name"
}
# Try this status
filter {
path = "desiredStatus"
values = ["RUNNING"]
}
# and this status
filter {
path = "lastStatus"
values = ["RUNNING"]
}
}
}
Not using any filters targets all tasks in my ECS service as expected, including tasks that are stopped (which cause the experiment to fail). So, I'm trying to filter on the desiredStatus
and/or the lastStatus
properties from the DescribeTasks API. Both of these properties appear to be top-level properties of the tasks
object in the DescribeTasks
API, so there's no property path other than just the property name. Ideally, since these things technically report slightly different things and to make the FIS experiment as reliable as possible, I'd like to filter on both of these statuses to make sure they're both set to RUNNING
. However, my filters always return a target resolution that is an empty set, even if I only supply one or the other. What is wrong with the filtering in this example?
After reading more thoroughly through the docs, it seems this might be one of the problems:
Each element must be expressed in Pascal case, even if the output of the
Describe
action for a resource is in camel case. For example, you should useAvailabilityZone
, notavailablityZone
as an attribute element.
Based on this, I would probably give it a try with:
resource "aws_fis_experiment_template" "my_experiment" {
stop_condition {
source = "none"
}
action {
# action data here...
}
target {
name = "my_targets"
resource_type = "aws:ecs:task"
selection_mode = "ALL"
resource_tag {
key = "aws:ecs:serviceName"
value = "my-service-name"
}
# Try this status
filter {
path = "DesiredStatus" # Pascal case, not camel case, i.e, desiredStatus
values = ["RUNNING"]
}
# and this status
filter {
path = "LastStatus" # Pascal case, not camel case, i.e, lastStatus
values = ["RUNNING"]
}
}
}