I have a 1.16.2 version cluster of kubernetes. When I deploy all the service in the cluster with the replicas is 1, it works fine. Then i scale all the service's replicas to 2 And check out. found that some service are running Normal but But some states are pending. when I kubectl describe one of the Pending pod, I get the message like below
[root@runsdata-bj-01 society-training-service-v1-0]# kcd society-resident-service-v3-0-788446c49b-rzjsx
Name: society-resident-service-v3-0-788446c49b-rzjsx
Namespace: runsdata
Priority: 0
Node: <none>
Labels: app=society-resident-service-v3-0
pod-template-hash=788446c49b
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/society-resident-service-v3-0-788446c49b
Containers:
society-resident-service-v3-0:
Image: docker.ssiid.com/society-resident-service:3.0.33
Port: 8231/TCP
Host Port: 0/TCP
Limits:
cpu: 1
memory: 4Gi
Requests:
cpu: 200m
memory: 2Gi
Liveness: http-get http://:8231/actuator/health delay=600s timeout=5s period=10s #success=1 #failure=3
Readiness: http-get http://:8231/actuator/health delay=30s timeout=5s period=10s #success=1 #failure=3
Environment:
spring_profiles_active: production
TZ: Asia/Hong_Kong
JAVA_OPTS: -Djgroups.use.jdk_logger=true -Xmx4000M -Xms4000M -Xmn600M -XX:PermSize=500M -XX:MaxPermSize=500M -Xss384K -XX:+DisableExplicitGC -XX:SurvivorRatio=1 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:CMSFullGCsBeforeCompaction=0 -XX:+CMSClassUnloadingEnabled -XX:LargePageSizeInBytes=128M -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=80 -XX:SoftRefLRUPolicyMSPerMB=0 -XX:+PrintClassHistogram -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -Xloggc:log/gc.log
Mounts:
/data/storage from nfs-data-storage (rw)
/opt/security from security (rw)
/var/log/runsdata from log (rw)
/var/run/secrets/kubernetes.io/serviceaccount from application-token-vgcvb (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
log:
Type: HostPath (bare host directory volume)
Path: /log/runsdata
HostPathType:
security:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-security-claim
ReadOnly: false
nfs-data-storage:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: data-storage-claim
ReadOnly: false
application-token-vgcvb:
Type: Secret (a volume populated by a Secret)
SecretName: application-token-vgcvb
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/4 nodes are available: 4 Insufficient memory.
And from below, you can see that my machine have more than 2G memory left .
[root@runsdata-bj-01 society-training-service-v1-0]# kcp |grep Pending
society-insurance-foundation-service-v2-0-7697b9bd5b-7btq6 0/1 Pending 0 60m
society-notice-service-v1-0-548b8d5946-c5gzm 0/1 Pending 0 60m
society-online-business-service-v2-1-7f897f564-phqjs 0/1 Pending 0 60m
society-operation-gateway-7cf86b77bd-lmswm 0/1 Pending 0 60m
society-operation-user-service-v1-1-755dcff964-dr9mj 0/1 Pending 0 60m
society-resident-service-v3-0-788446c49b-rzjsx 0/1 Pending 0 60m
society-training-service-v1-0-774f8c5d98-tl7vq 0/1 Pending 0 60m
society-user-service-v3-0-74865dd9d7-t9fwz 0/1 Pending 0 60m
traefik-ingress-controller-8688cccf79-5gkjg 0/1 Pending 0 60m
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
192.168.0.94 384m 9% 11482Mi 73%
192.168.0.95 399m 9% 11833Mi 76%
192.168.0.96 399m 9% 11023Mi 71%
192.168.0.97 457m 11% 10782Mi 69%
[root@runsdata-bj-01 society-training-service-v1-0]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.0.94 Ready <none> 8d v1.16.2
192.168.0.95 Ready <none> 8d v1.16.2
192.168.0.96 Ready <none> 8d v1.16.2
192.168.0.97 Ready <none> 8d v1.16.2
[root@runsdata-bj-01 society-training-service-v1-0]#
here is the description of all 4 node
[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.94
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1930m (48%) 7600m (190%)
memory 9846Mi (63%) 32901376Ki (207%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root@runsdata-bj-01 frontend]#kubectl describe node 192.168.0.95
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1670m (41%) 6600m (165%)
memory 7196Mi (46%) 21380Mi (137%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.96
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2610m (65%) 7 (175%)
memory 9612Mi (61%) 19960Mi (128%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
[root@runsdata-bj-01 frontend]# kubectl describe node 192.168.0.97
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 2250m (56%) 508200m (12705%)
memory 10940Mi (70%) 28092672Ki (176%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
And the memory of all the 4 node:
[root@runsdata-bj-00 ~]# free -h
total used free shared buff/cache available
Mem: 15G 2.8G 6.7G 2.1M 5.7G 11G
Swap: 0B 0B 0B
[root@runsdata-bj-01 frontend]# free -h
total used free shared buff/cache available
Mem: 15G 7.9G 3.7G 2.4M 3.6G 6.8G
Swap: 0B 0B 0B
[root@runsdata-bj-02 ~]# free -h
total used free shared buff/cache available
Mem: 15G 5.0G 2.9G 3.9M 7.4G 9.5G
Swap: 0B 0B 0B
[root@runsdata-bj-03 ~]# free -h
total used free shared buff/cache available
Mem: 15G 6.5G 2.2G 2.3M 6.6G 8.2G
Swap: 0B 0B 0B
here is the kube-scheduler log:
[root@runsdata-bj-01 log]# cat messages|tail -n 5000|grep kube-scheduler
Apr 17 14:31:24 runsdata-bj-01 kube-scheduler: E0417 14:31:24.404442 12740 factory.go:585] pod is already present in the activeQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.490310 12740 factory.go:585] pod is already present in the backoffQ
Apr 17 14:31:25 runsdata-bj-01 kube-scheduler: E0417 14:31:25.873292 12740 factory.go:585] pod is already present in the backoffQ
Apr 18 21:44:18 runsdata-bj-01 etcd: read-only range request "key:\"/registry/services/endpoints/kube-system/kube-scheduler\" " with result "range_response_count:1 size:440" took too long (100.521269ms) to execute
Apr 18 21:59:40 runsdata-bj-01 kube-scheduler: E0418 21:59:40.050852 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.069465 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:07 runsdata-bj-01 kube-scheduler: E0418 22:03:07.950254 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:08 runsdata-bj-01 kube-scheduler: E0418 22:03:08.567290 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.152812 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:03:09 runsdata-bj-01 kube-scheduler: E0418 22:03:09.344902 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:04:32 runsdata-bj-01 kube-scheduler: E0418 22:04:32.969606 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:09:51 runsdata-bj-01 kube-scheduler: E0418 22:09:51.366877 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.430976 12740 factory.go:585] pod is already present in the activeQ
Apr 18 22:32:16 runsdata-bj-01 kube-scheduler: E0418 22:32:16.441182 12740 factory.go:585] pod is already present in the activeQ
I searched google and stackoverflow and can not found the solution. who can help me ?
Kubernetes preserves the node stability instead the resource provisioning, the memory available is not calculate based on free -m
command, as the documentation mention:
The value for
memory.available
is derived from the cgroupfs instead of tools likefree -m
. This is important becausefree -m
does not work in a container, and if users use the node allocatable feature, out of resource decisions are made local to the end user Pod part of the cgroup hierarchy as well as the root node. This script reproduces the same set of steps that thekubelet
performs to calculatememory.available
. Thekubelet
excludes inactive_file (i.e. # of bytes of file-backed memory on inactive LRU list) from its calculation as it assumes that memory is reclaimable under pressure.
You could use the script mentioned above to check your memory available in the nodes and if there's no available resource you will need to increase the cluster size adding a new node.
Additionally, you can check the documenation page for more information about resources limits