kubernetesweave

Kubectl is working but I can't access any of the components


We have a small private k8s cluster and until this morning everything was working but as of this morning just kubectl is working and no traffic is going through.

I mean I can launch new deployments, kill them, etc and I can see that they are up and running

But when I want to access them via http, amqp, etc I can't.
I was looking at our nginx logs and tried to go to the homepage but there was no log in nginx and nothing loaded in browser which means that no traffic received by nginx.
We are using Weave net as our CNI.

I checked the dns logs and also tested it and dns is working. I don't know where to start looking for solving this problem, any suggestion?

Update

After some hours the problem almost solved and now I can access my applications but I want to ask another question which is very related to this:

Is there a way that we can detect that the problem is because of networking or it is from the cluster networking (the internal k8s network)? I am asking this because in the past I had a problem with k8s dns and this time I thought something is wrong with the k8s CNI.

Update 2

Now I see this error in weave:

ERRO: 2019/09/27 11:10:03.358321 Captured frame from MAC (d2:14:2a:47:62:d9) to (02:01:5b:b9:8e:fd) associated with another peer 4a:8d:75:d7:59:ff(serflex-argus-2)

And my environment:

Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.1", GitCommit:"b7394102d6ef778017f2ca4046abbaa23b88c290", GitTreeState:"clean", BuildDate:"2019-04-08T17:11:31Z", GoVersion:"go1.12.1", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.3", GitCommit:"2d3c76f9091b6bec110a5e63777c332469e0cba2", GitTreeState:"clean", BuildDate:"2019-08-19T11:05:50Z", GoVersion:"go1.12.9", Compiler:"gc", Platform:"linux/amd64"}
/home/weave # ./weave --local status
        Version: 2.5.2 (up to date; next check at 2019/09/27 15:12:49)
        Service: router
       Protocol: weave 1..2
           Name: 02:01:5b:b9:8e:fd(k8s-master)
     Encryption: disabled
  PeerDiscovery: enabled
        Targets: 1
    Connections: 5 (4 established, 1 failed)
          Peers: 5 (with 20 established connections)
 TrustedSubnets: none
        Service: ipam
         Status: ready

Solution

  • I couldn't find a solution for this problem and I had to tear down the cluster and recreate it but this time I used Calico and after running for a week there was no problem.

    The only thing I think could cause the problem was the 200Mb memory limit of the Weave and the fact that 4 out of 5 of my Weave pods were hitting that limit and also on their github I found that Weave has an issue with memleak and because of these I decided to change the CNI.