This deployment has been running fine for months. It looks like the Pods redeployed early this morning, I think probably related to applying 2023.10.31 (AKSSecurityPatchedVHD
.
The Pods that mount Azure Files for file storage are stuck in ContainerCreating
with the following error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m46s default-scheduler Successfully assigned env/<deployment> to <aks-node>
Warning FailedMount 3m45s (x2 over 3m46s) kubelet MountVolume.MountDevice failed for volume "<pvc>" : rpc error: code = Internal desc = volume(<resource-group>) mount //<stuff>.file.core.windows.net/<pvc> on /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,actimeo=30,nosharesock,file_mode=0777,dir_mode=0777,<masked> //<stuff>.file.core.windows.net/<pvc> /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount
Output: mount error: cifs filesystem not supported by the system
mount error(19): No such device
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)
Please refer to http://aka.ms/filemounterror for possible causes and solutions for mount errors.
Warning FailedMount 104s kubelet Unable to attach or mount volumes: unmounted volumes=[file-storage], unattached volumes=[file-storage kube-api-access-xbprr]: timed out waiting for the condition
Kind of stumped. What I've tried:
Issue persists and I'm not sure what to try next other than redeploying everything.
There isn't anything helpful at http://aka.ms/filemounterror. Nothing has changed in the environment for months. Another environment is running fine and it is basically a duplicate of this one, so seems isolated to this one. These are Linux nodes.
My storage.yaml
:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: file-storage
namespace: env
spec:
accessModes:
- ReadWriteMany
storageClassName: azurefile
resources:
requests:
storage: 25Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-storage
namespace: env
spec:
accessModes:
- ReadWriteOnce
storageClassName: default
resources:
requests:
storage: 25Gi
postgres-storage
seems to be fine, it is the file-storage
that is being an issue.
This didn't affect us for AKS but it did for other VMs in our Azure Tenant; it seems that there is an issue with the CIFS module not being included in the MS Kernel builds
Between 6.2.0-1015 and 6.2.0-1016, the CIFS module was moved from fs/cifs/* to fs/smb/client/, fs/smb/common/ and fs/smb/server/*. The inclusion list (root/debian.azure-6.2/control.d/azure.inclusion-list) was not updated for this change, so the module is not included in the linux-modules-6.2.0-1026-azure package.
I'm not sure why it hasn't affected Kubernetes version 1.27.3; perhaps MS haven't moved that to the 6.2.0.1206 kernel yet?
A work around has been posted:
## Install older kernel
sudo apt install linux-image-6.2.0-1015-azure
## Remove newer kernel (select NO when asked)
sudo apt remove linux-image-6.2.0-1016-azure
## Reboot
sudo reboot