nginxkuberneteskube-proxy

Connection refused between kube-proxy and nginx backend


We are regularly seeing connection refused errors on a bespoke NGINX reverse proxy installed in AWS EKS. (see below for kubernetes template)

Initially, we thought it was an issue with the load balancer. However, upon further investigation, there seems to be an issue between the kube-proxy and the nginx Pod.

When I run repeated wget IP:PORT against just the node's internal IP and the desired node port that serves, we will see bad request several times and eventually, a failed: Connection refused

Whereas when I run a request just against the Pod IP and Port, I can not get this connection refused.

Example wget output

Fail:

wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31--  http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... failed: Connection refused.

Success:

wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31--  http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... connected.
HTTP request sent, awaiting response... 400 Bad Request
2020-06-26 01:15:31 ERROR 400: Bad Request.

In the logs on the NGINX service, we don't see the connection refused the request, whereas we do see the other BAD REQUEST ones.

I have read about several issues regarding kube-proxy and I am interested in other insights to improve this situation.

eg: https://github.com/kubernetes/kubernetes/issues/38456

Any help much appreciated.

Kubernetes Template

##
# Main nginx deployment. Requires updated tag potentially for
# docker image
##
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
  name: nginx-lua-ssl-deployment
  labels:
    service: https-custom-domains
spec:
  selector:
    matchLabels:
      app: nginx-lua-ssl
  replicas: 5
  template:
    metadata:
      labels:
        app: nginx-lua-ssl
        service: https-custom-domains
    spec:
      containers:
      - name: nginx-lua-ssl
        image: "0000000000.dkr.ecr.ap-southeast-2.amazonaws.com/lua-resty-auto-ssl:v0.NN"
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
        - containerPort: 8443
        - containerPort: 8999
        envFrom:
         - configMapRef:
            name: https-custom-domain-conf

##
# Load balancer which manages traffic into the nginx instance
# In aws, this uses an ELB (elastic load balancer) construct
##
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: nlb
  name: nginx-lua-load-balancer
  labels:
    service: https-custom-domains
spec:
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: https
    port: 443
    targetPort: 8443
  externalTrafficPolicy: Local
  selector:
    app: nginx-lua-ssl
  type: LoadBalancer

Solution

  • In the end this issue was caused by a Pod incorrectly configured such that the load balancer routing traffic to it:

    selector:
      matchLabels:
        app: redis-cli
    

    There were 5 nginx pods correctly receiving traffic and one utility Pod incorrectly receiving traffic and responding by refusing the connection as you would expect.

    Thanks for responses.