amazon-web-servicesamazon-ecsnlb

intermittent timeout / empty reply from server with AWS ECS and NLB


I have the below architecture. My ECS is running in 2 private subnets. It has 2 EC2 instances. Each EC2 instance has 2 services running as ECS DAEMON. Both services have NLB attached. I want to create a VPC endpoint in another VPC that's why using NLB. My service 1 tried to connect with service 2 using NLB then I got intermittent empty reply from server. After searching, I thought maybe HTTPS was an issue. So I removed it but now I am getting intermittent timeout for HTTP requests.

This is happening only if I go through NLB from ECS EC2 instances. I created a jump host in the same subnet and from that jump host, it's working fine.

I have a security group on those ECS EC2 instances which allows traffic on those service 1 & 2 ports from complete VPC.

enter image description here

NLB TF

resource "aws_lb" "demo" {
  name                             = "demo-shared-nlb-${var.environment}"
  internal                         = true
  load_balancer_type               = "network"
  enable_cross_zone_load_balancing = "true"
  enable_deletion_protection       = "false"
  subnets = [
    aws_subnet.shared_private_subnet_1a.id,
    aws_subnet.shared_private_subnet_1b.id
  ]

  tags = merge(
    var.tags,
    {
      "Name" = "demo-shared-nlb-${var.environment}"
    },
  )
}

Solution

  • I had a call with an AWS support representative who suggested the following solution, which worked for me:

    1. Disable the "Preserve Client IP" setting, which is enabled by default for NLB.
    2. In addition to using a security group for EC2, also apply a security group to the NLB.