amazon-web-servicesterraform

count value depends on resource attributes that cannot be determined until apply in module


I've seen this question asked elsewhere but I don't think it quite addresses the issue I'm having.

We're using a module to create an assumeable role:

module "service_task_role" {
  source                          = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
  version                         = "5.32.0"
  create_custom_role_trust_policy = true
  create_role                     = true
  role_name                       = "${local.full_name}-task-role"
  role_requires_mfa               = false
  custom_role_trust_policy        = data.aws_iam_policy_document.task_trust_policy_document.json
  custom_role_policy_arns = concat(
    [var.task_policy_arn],
    [for rule in module.rds_security_group_rules : rule.policy_arn]
  )
  number_of_custom_role_policy_arns = 1 + (var.database_connection == null ? 0 : 1)
  tags                              = module.service_iam_label.tags
}

This worked fine until we realized that occasionally we needed a role created without a task_policy_arn. I modified this code to look like this:

locals {
  has_task_policy     = var.task_policy_arn != null
  num_custom_policies = (local.has_task_policy ? 1 : 0) + (var.database_connection == null ? 0 : 1)
}

module "service_task_role" {
  source                          = "terraform-aws-modules/iam/aws//modules/iam-assumable-role"
  version                         = "5.32.0"
  create_custom_role_trust_policy = true
  create_role                     = true
  role_name                       = "${local.full_name}-task-role"
  role_requires_mfa               = false
  custom_role_trust_policy        = data.aws_iam_policy_document.task_trust_policy_document.json
  custom_role_policy_arns = concat(
    local.has_task_policy ? [var.task_policy_arn] : [],
    [for rule in module.rds_security_group_rules : rule.policy_arn]
  )
  number_of_custom_role_policy_arns = local.num_custom_policies
  tags                              = module.service_iam_label.tags
}

and now the plan fails. Looking out a bit wider, this code is part of a module which creates an ECS task. We typically call this module like so:

module "service_policy" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-policy"
  version = "5.32.0"

  count = local.create_service_policy ? 1 : 0

  name        = "service-${var.application}-policy"
  path        = "/"
  description = "Policy for the ${title(var.application)} service"

  policy = jsonencode({
    "Version" : "2012-10-17",
    "Statement" : concat(var.parameter_store_access || local.has_local_secrets ? [
      {
        "Action" : [
          "ssm:GetParameter",
          "ssm:GetParameters",
          "ssm:GetParametersByPath",
        ],
        "Effect" : "Allow",
        "Resource" : [
          "arn:aws:ssm:${var.region}:${local.aws_account_id}:parameter/*",
        ]
      }
      ] : [],
      local.secrets_manager_config,
      var.custom_policies,
    )
  })

  tags = module.service_label.tags
}

module "shared_ecs" {
  source  = "app.terraform.io/My-Org/aws//modules/ecs"
  version = "2.0.7"

  # Define the core environment variables
  region      = var.region
  environment = var.environment

  # Define the details associated with the load balancer and networking
  listener_priority = var.priority
  service_port      = var.port
  slow_start        = var.slow_start
  short_name        = var.short_name
  health_check      = local.health_check_endpoint

  # Define the scopes for the API
  scopes = var.scopes

  # Define the details associated with the ECS task
  task_policy_arn = local.create_service_policy ? module.service_policy[0].arn : null
  desired_count   = local.instance_count
  cpu             = var.cpu
  memory          = var.memory

  # Define the RDS instance details if provided
  database_connection = var.database_connection
}

In this code, we should know the condition of module.service_task_role[0] once the Terraform plan completes and local.create_service_policy only depends on variable inputs. Since this is the top-level module, those are all constant. Thus, I don't think I should be seeing an error but when I try to do a plan I get the following:

╷
│ Error: Invalid count argument
│
│   on .terraform/modules/service.shared_ecs.service_task_role/modules/iam-assumable-role/main.tf line 165, in resource "aws_iam_role_policy_attachment" "custom":
│  165:   count = var.create_role ? coalesce(var.number_of_custom_role_policy_arns, length(var.custom_role_policy_arns)) : 0
│
│ The "count" value depends on resource attributes that cannot be determined
│ until apply, so Terraform cannot predict how many instances will be
│ created. To work around this, use the -target argument to first apply only
│ the resources that the count depends on.
╵

I thought I could create two instances of service_task_role, one including var.task_policy_arn and one without it but that would also involve the use of count so it seems like that would result in the same issue.

EDIT I was able to fix the error by adding a moved block for all the existing callers of this module:

moved {
  from = module.service["some_service"].module.service_policy.aws_iam_policy.policy[0]
  to   = module.service["some_service"].module.service_policy[0].aws_iam_policy.policy[0]
}

Now I'm even more confused.

Does anyone know why this is happening and what I can do to resolve the issue?


Solution

  • The main essence of the change you made is that the number of elements in custom_role_policy_arns is now decided based on whether var.task_policy_arn is null. This means that Terraform can only determine the length of that collection once the "nullness" of the task policy ARN has been decided.

    Unfortunately, Terraform's support for tracking whether a not-yet-decided value could potentially be null is relatively recent (Terraform v1.6 in late 2023) and so large providers like hashicorp/aws have not been fully updated to be able to tell Terraform whether their exported values are nullable or not.

    The most likely explanation for the behavior you encountered, then, is that var.task_policy_arn is derived from the arn attribute of some resource instance in the service_policy module that hasn't been created yet, and so the provider is reporting that this value (and its nullness) won't be decided until the apply phase, once the remote object has been created.

    If you know that in practice this ARN value cannot possibly be null (which is true for most arn attributes in the hashicorp/aws provider) then until the provider is updated to report this properly itself you can work around it by giving Terraform some more information to help it to better understand the situation.

    For example, in the module "shared_ecs" block you could write the definition of task_policy_arn like this:

      task_policy_arn = (
        local.create_service_policy ?
        coalesce(module.service_policy[0].arn) :
        null
      )
    

    By definition the coalesce function can never produce a null value -- it returns an error if all of the given arguments are null -- and so Terraform automatically infers that any value derived from the result of that function cannot be null even if the value is otherwise unknown.

    This means that inside the shared_ecs module Terraform can assume that var.task_policy_arn is not null, meaning that:


    There's some more background context on the current situation in hashicorp/terraform-plugin-framework#869, in case that's interesting. That issue discusses implementation details behind the problem rather than the surface-level problem itself, but I'm linking to it here just because I think that's the most likely place that any progress on this problem would be reported, so if someone finds this question in the far future they can learn if my answer is still valid or if the situation has changed in the meantime.