I'm trying to run K8ssandra but the Cassandra container keeps failing with the following message (Repeating over and over):
WARN [epollEventLoopGroup-374-2] 2021-12-30 23:54:23,711 AbstractBootstrap.java:452 - Unknown channel option 'TCP_NODELAY' for channel '[id: 0x7cf79bf5]'
WARN [epollEventLoopGroup-374-2] 2021-12-30 23:54:23,712 Loggers.java:39 - [s369] Error connecting to Node(endPoint=/tmp/cassandra.sock, hostId=null, hashCode=7ec5e39e), trying next node (FileNotFoundException: null)
INFO [nioEventLoopGroup-2-1] 2021-12-30 23:54:23,713 Cli.java:617 - address=/100.97.28.180:53816 url=/api/v0/metadata/endpoints status=500 Internal Server Error
and from the server-system-logger
container:
tail: cannot open '/var/log/cassandra/system.log' for reading: No such file or directory
and finally, in the cass-operator
pod:
2021-12-30T23:56:22.580Z INFO controllers.CassandraDatacenter incorrect status code when calling Node Management Endpoint {"cassandradatacenter": "default/dc1", "requestNamespace": "default", "requestName": "dc1", "loopID": "d1f81abc-6b68-4e63-9e95-1c2b5f6d4e9d", "namespace": "default", "datacenterName": "dc1", "clusterName": "mydomaincom", "statusCode": 500, "pod": "100.122.58.236"}
2021-12-30T23:56:22.580Z ERROR controllers.CassandraDatacenter Could not get endpoints data {"cassandradatacenter": "default/dc1", "requestNamespace": "default", "requestName": "dc1", "loopID": "d1f81abc-6b68-4e63-9e95-1c2b5f6d4e9d", "namespace": "default", "datacenterName": "dc1", "clusterName": "mydomaincom", "error": "incorrect status code of 500 when calling endpoint"}
Not really sure what's happening here. It works fine using the same config on a local minikube cluster, but I can't seem to get it to work on my AWS cluster (running kubernetes v1.20.10)
All other pods are running fine.
NAME READY STATUS RESTARTS AGE
mydomaincom-dc1-rac1-sts-0 2/3 Running 0 17m
k8ssandra-cass-operator-8675f58b89-qt2dx 1/1 Running 0 29m
k8ssandra-medusa-operator-589995d979-rnjhr 1/1 Running 0 29m
k8ssandra-reaper-operator-5d9d5d975d-c6nhv 1/1 Running 0 29m
the pod events show this:
Warning Unhealthy 109s (x88 over 16m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
My values.yaml (deployed with Helm3):
cassandra:
enabled: true
version: "4.0.1"
versionImageMap:
3.11.7: k8ssandra/cass-management-api:3.11.7-v0.1.33
3.11.8: k8ssandra/cass-management-api:3.11.8-v0.1.33
3.11.9: k8ssandra/cass-management-api:3.11.9-v0.1.27
3.11.10: k8ssandra/cass-management-api:3.11.10-v0.1.27
3.11.11: k8ssandra/cass-management-api:3.11.11-v0.1.33
4.0.0: k8ssandra/cass-management-api:4.0.0-v0.1.33
4.0.1: k8ssandra/cass-management-api:4.0.1-v0.1.33
clusterName: "mydomain.com"
auth:
enabled: true
superuser:
secret: ""
username: ""
cassandraLibDirVolume:
storageClass: default
size: 100Gi
encryption:
keystoreSecret:
keystoreMountPath:
truststoreSecret:
truststoreMountPath:
additionalSeeds: []
heap: {}
resources:
requests:
memory: 4Gi
cpu: 500m
limits:
memory: 4Gi
cpu: 1000m
datacenters:
-
name: dc1
size: 1
racks:
- name: rac1
heap: {}
ingress:
enabled: false
stargate:
enabled: false
reaper:
autoschedule: true
enabled: true
cassandraUser:
secret: ""
username: ""
jmx:
secret: ""
username: ""
medusa:
enabled: true
image:
registry: docker.io
repository: k8ssandra/medusa
tag: 0.11.3
cassandraUser:
secret: ""
username: ""
storage_properties:
region: us-east-1
bucketName: my-bucket-name
storageSecret: medusa-bucket-key
reaper-operator:
enabled: true
monitoring:
grafana:
provision_dashboards: false
prometheus:
provision_service_monitors: false
kube-prometheus-stack:
enabled: false
prometheusOperator:
enabled: false
serviceMonitor:
selfMonitor: false
prometheus:
enabled: false
grafana:
enabled: false
I was able to fix this by increasing the memory to 12Gi