kubernetesrabbitmqgoogle-kubernetes-enginekubernetes-helmrabbitmq-exchange

Rabbit mq - Error while waiting for Mnesia tables


I have installed rabbitmq using helm chart on a kubernetes cluster. The rabbitmq pod keeps restarting. On inspecting the pod logs I get the below error

2020-02-26 04:42:31.582 [warning] <0.314.0> Error while waiting for Mnesia tables: {timeout_waiting_for_tables,[rabbit_durable_queue]}
2020-02-26 04:42:31.582 [info] <0.314.0> Waiting for Mnesia tables for 30000 ms, 6 retries left

When I try to do kubectl describe pod I get this error

Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  data:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  data-rabbitmq-0
    ReadOnly:   false
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-config
    Optional:  false
  healthchecks:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      rabbitmq-healthchecks
    Optional:  false
  rabbitmq-token-w74kb:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  rabbitmq-token-w74kb
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  beta.kubernetes.io/arch=amd64
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                      From                                               Message
  ----     ------     ----                     ----                                               -------
  Warning  Unhealthy  3m27s (x878 over 7h21m)  kubelet, gke-analytics-default-pool-918f5943-w0t0  Readiness probe failed: Timeout: 70 seconds ...
Checking health of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Status of node rabbit@rabbitmq-0.rabbitmq-headless.default.svc.cluster.local ...
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}
Error:
{:aborted, {:no_exists, [:rabbit_vhost, [{{:vhost, :"$1", :_, :_}, [], [:"$1"]}]]}}

I have provisioned the above on Google Cloud on a kubernetes cluster. I am not sure during what specific situation it started failing. I had to restart the pod and since then it has been failing.

What is the issue here ?


Solution

  • test this deploy:

    kind: Service
    apiVersion: v1
    metadata:
      namespace: rabbitmq-namespace
      name: rabbitmq
      labels:
        app: rabbitmq
        type: LoadBalancer  
    spec:
      type: NodePort
      ports:
       - name: http
         protocol: TCP
         port: 15672
         targetPort: 15672
         nodePort: 31672
       - name: amqp
         protocol: TCP
         port: 5672
         targetPort: 5672
         nodePort: 30672
       - name: stomp
         protocol: TCP
         port: 61613
         targetPort: 61613
      selector:
        app: rabbitmq
    ---
    kind: Service 
    apiVersion: v1
    metadata:
      namespace: rabbitmq-namespace
      name: rabbitmq-lb
      labels:
        app: rabbitmq
    spec:
      # Headless service to give the StatefulSet a DNS which is known in the cluster (hostname-#.app.namespace.svc.cluster.local, )
      # in our case - rabbitmq-#.rabbitmq.rabbitmq-namespace.svc.cluster.local  
      clusterIP: None
      ports:
       - name: http
         protocol: TCP
         port: 15672
         targetPort: 15672
       - name: amqp
         protocol: TCP
         port: 5672
         targetPort: 5672
       - name: stomp
         port: 61613
      selector:
        app: rabbitmq
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: rabbitmq-config
      namespace: rabbitmq-namespace
    data:
      enabled_plugins: |
          [rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_stomp].
    
      rabbitmq.conf: |
          ## Cluster formation. See http://www.rabbitmq.com/cluster-formation.html to learn more.
          cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
          cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
          ## Should RabbitMQ node name be computed from the pod's hostname or IP address?
          ## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
          ## Set to "hostname" to use pod hostnames.
          ## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
          ## environment variable.
          cluster_formation.k8s.address_type = hostname   
          ## Important - this is the suffix of the hostname, as each node gets "rabbitmq-#", we need to tell what's the suffix
          ## it will give each new node that enters the way to contact the other peer node and join the cluster (if using hostname)
          cluster_formation.k8s.hostname_suffix = .rabbitmq.rabbitmq-namespace.svc.cluster.local
          ## How often should node cleanup checks run?
          cluster_formation.node_cleanup.interval = 30
          ## Set to false if automatic removal of unknown/absent nodes
          ## is desired. This can be dangerous, see
          ##  * http://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
          ##  * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
          cluster_formation.node_cleanup.only_log_warning = true
          cluster_partition_handling = autoheal
          ## See http://www.rabbitmq.com/ha.html#master-migration-data-locality
          queue_master_locator=min-masters
          ## See http://www.rabbitmq.com/access-control.html#loopback-users
          loopback_users.guest = false
    ---
    apiVersion: apps/v1
    kind: StatefulSet
    metadata:
      name: rabbitmq
      namespace: rabbitmq-namespace
    spec:
      serviceName: rabbitmq
      replicas: 3
      selector:
        matchLabels:
          name: rabbitmq
      template:
        metadata:
          labels:
            app: rabbitmq
            name: rabbitmq
            state: rabbitmq
          annotations:
            pod.alpha.kubernetes.io/initialized: "true"
        spec:
          serviceAccountName: rabbitmq
          terminationGracePeriodSeconds: 10
          containers:        
          - name: rabbitmq-k8s
            image: rabbitmq:3.8.3
            volumeMounts:
              - name: config-volume
                mountPath: /etc/rabbitmq
              - name: data
                mountPath: /var/lib/rabbitmq/mnesia
            ports:
              - name: http
                protocol: TCP
                containerPort: 15672
              - name: amqp
                protocol: TCP
                containerPort: 5672
            livenessProbe:
              exec:
                command: ["rabbitmqctl", "status"]
              initialDelaySeconds: 60
              periodSeconds: 60
              timeoutSeconds: 10
            resources:
                requests:
                  memory: "0"
                  cpu: "0"
                limits:
                  memory: "2048Mi"
                  cpu: "1000m"
            readinessProbe:
              exec:
                command: ["rabbitmqctl", "status"]
              initialDelaySeconds: 20
              periodSeconds: 60
              timeoutSeconds: 10
            imagePullPolicy: Always
            env:
              - name: MY_POD_IP
                valueFrom:
                  fieldRef:
                    fieldPath: status.podIP
              - name: NAMESPACE
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.namespace
              - name: HOSTNAME
                valueFrom:
                  fieldRef:
                    fieldPath: metadata.name
              - name: RABBITMQ_USE_LONGNAME
                value: "true"
              # See a note on cluster_formation.k8s.address_type in the config file section
              - name: RABBITMQ_NODENAME
                value: "rabbit@$(HOSTNAME).rabbitmq.$(NAMESPACE).svc.cluster.local"
              - name: K8S_SERVICE_NAME
                value: "rabbitmq"
              - name: RABBITMQ_ERLANG_COOKIE
                value: "mycookie"      
          volumes:
            - name: config-volume
              configMap:
                name: rabbitmq-config
                items:
                - key: rabbitmq.conf
                  path: rabbitmq.conf
                - key: enabled_plugins
                  path: enabled_plugins
      volumeClaimTemplates:
      - metadata:
          name: data
        spec:
          accessModes:
            - "ReadWriteOnce"
          storageClassName: "default"
          resources:
            requests:
              storage: 3Gi
    
    ---
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: rabbitmq 
      namespace: rabbitmq-namespace 
    ---
    kind: Role
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: endpoint-reader
      namespace: rabbitmq-namespace 
    rules:
    - apiGroups: [""]
      resources: ["endpoints"]
      verbs: ["get"]
    ---
    kind: RoleBinding
    apiVersion: rbac.authorization.k8s.io/v1beta1
    metadata:
      name: endpoint-reader
      namespace: rabbitmq-namespace
    subjects:
    - kind: ServiceAccount
      name: rabbitmq
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: endpoint-reader