[SOLVED] AWS EKS Network Load Balancer traffic issues

AWS EKS Network Load Balancer traffic issues

I had some strange issues with traffic/connections, let me describe my issue below:

I have AWS EKS + configured "aws-load-balancer-controller".

I have several pods + ingress (when ingress is going to be added to EKS - ALB was successfully created and it is working without any issues, I am good here).

Now I need to run a pod which is listening some TCP port, for example, 5555 and route traffic to this pod. So, I can`t use ALB, I need NLB.

I found that NLB can be created with configured specific annotations for pod service. So, I am using this config:

apiVersion: v1
kind: Service
metadata:
  name: {{ include "XXX.fullname" . }}
  labels:
    {{- include "XXX.labels" . | nindent 4 }}
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-subnets: subnet-xxx, subnet-yyy
    service.beta.kubernetes.io/aws-load-balancer-name: xx-nlb-xx
    service.beta.kubernetes.io/aws-load-balancer-security-groups: sg-xxx

So, this configuration was created NLV for me. But, I am trying to send some specific data for my specific app and I am getting an error.

Here is an error (but I guess it is not does matter):

13:12:24.855 INFO - STORESCU->COINSDCMRCV(1) >> A-ASSOCIATE-RJ[result: 2 - rejected-transient, source: 3 - service-provider (Presentation related function), reason: 2 - local-limit-exceeded]

Most interesting thing, that I found - I can send succesfully data but for one specific time period:

When NLB was created - also target group was created
Target group has added listener - pod with specific IP (my deployed service)
This listener is marked is Healthy, so, looks like there is no problems

Most ineresting thing - when I am going to re deploy my service - listener is going to be recreated (new pod has new IP address, so, listener is going to be recreated)

And in this time, while old one listener is going to be deleted, and new one is going to be created - I can send successfully send data! But when new listener is added and status changed to Healthy - I am starting to get this error again!

So, I have time period during listener recreation, but after that - it is not working as expected.

What I am missing? What I can check? Any advices?

Solution

I found the reason.

My python application (not my, but which I am trying to deploy to K8s) by default had 10 connections limit (and kept these connections for 60 sec before close).

So, during configuring health check listener for NLB - all these connections were busy (health check took all of them as it check this resource each 10sec)

That`s it =)