We are regularly seeing connection refused errors on a bespoke NGINX reverse proxy installed in AWS EKS. (see below for kubernetes template)
Initially, we thought it was an issue with the load balancer. However, upon further investigation, there seems to be an issue between the kube-proxy and the nginx Pod.
When I run repeated wget IP:PORT
against just the node's internal IP and the desired node port that serves, we will see bad request several times and eventually, a failed: Connection refused
Whereas when I run a request just against the Pod IP and Port, I can not get this connection refused.
Example wget output
Fail:
wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31-- http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... failed: Connection refused.
Success:
wget ip.ap-southeast-2.compute.internal:30102
--2020-06-26 01:15:31-- http://ip.ap-southeast-2.compute.internal:30102/
Resolving ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)... 10.1.95.3
Connecting to ip.ap-southeast-2.compute.internal (ip.ap-southeast-2.compute.internal)|10.1.95.3|:30102... connected.
HTTP request sent, awaiting response... 400 Bad Request
2020-06-26 01:15:31 ERROR 400: Bad Request.
In the logs on the NGINX service, we don't see the connection refused the request, whereas we do see the other BAD REQUEST ones.
I have read about several issues regarding kube-proxy
and I am interested in other insights to improve this situation.
eg: https://github.com/kubernetes/kubernetes/issues/38456
Any help much appreciated.
Kubernetes Template
##
# Main nginx deployment. Requires updated tag potentially for
# docker image
##
---
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: nginx-lua-ssl-deployment
labels:
service: https-custom-domains
spec:
selector:
matchLabels:
app: nginx-lua-ssl
replicas: 5
template:
metadata:
labels:
app: nginx-lua-ssl
service: https-custom-domains
spec:
containers:
- name: nginx-lua-ssl
image: "0000000000.dkr.ecr.ap-southeast-2.amazonaws.com/lua-resty-auto-ssl:v0.NN"
imagePullPolicy: Always
ports:
- containerPort: 8080
- containerPort: 8443
- containerPort: 8999
envFrom:
- configMapRef:
name: https-custom-domain-conf
##
# Load balancer which manages traffic into the nginx instance
# In aws, this uses an ELB (elastic load balancer) construct
##
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: nlb
name: nginx-lua-load-balancer
labels:
service: https-custom-domains
spec:
ports:
- name: http
port: 80
targetPort: 8080
- name: https
port: 443
targetPort: 8443
externalTrafficPolicy: Local
selector:
app: nginx-lua-ssl
type: LoadBalancer
In the end this issue was caused by a Pod incorrectly configured such that the load balancer routing traffic to it:
selector:
matchLabels:
app: redis-cli
There were 5 nginx pods correctly receiving traffic and one utility Pod incorrectly receiving traffic and responding by refusing the connection as you would expect.
Thanks for responses.