linuxazurekubernetesazure-aksazure-files

AKS no longer able to connect to Azure Files PVC with Output: mount error: cifs filesystem not supported by the system mount error(19): No such device


This deployment has been running fine for months. It looks like the Pods redeployed early this morning, I think probably related to applying 2023.10.31 (AKSSecurityPatchedVHD.

The Pods that mount Azure Files for file storage are stuck in ContainerCreating with the following error:

Events:
  Type     Reason       Age                    From               Message
  ----     ------       ----                   ----               -------
  Normal   Scheduled    3m46s                  default-scheduler  Successfully assigned env/<deployment> to <aks-node>
  Warning  FailedMount  3m45s (x2 over 3m46s)  kubelet            MountVolume.MountDevice failed for volume "<pvc>" : rpc error: code = Internal desc = volume(<resource-group>) mount //<stuff>.file.core.windows.net/<pvc> on /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount failed with mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t cifs -o mfsymlinks,actimeo=30,nosharesock,file_mode=0777,dir_mode=0777,<masked> //<stuff>.file.core.windows.net/<pvc> /var/lib/kubelet/plugins/kubernetes.io/csi/file.csi.azure.com/<stuff>/globalmount
Output: mount error: cifs filesystem not supported by the system
mount error(19): No such device
Refer to the mount.cifs(8) manual page (e.g. man mount.cifs) and kernel log messages (dmesg)

Please refer to http://aka.ms/filemounterror for possible causes and solutions for mount errors.
  Warning  FailedMount  104s  kubelet  Unable to attach or mount volumes: unmounted volumes=[file-storage], unattached volumes=[file-storage kube-api-access-xbprr]: timed out waiting for the condition

Kind of stumped. What I've tried:

Issue persists and I'm not sure what to try next other than redeploying everything.

There isn't anything helpful at http://aka.ms/filemounterror. Nothing has changed in the environment for months. Another environment is running fine and it is basically a duplicate of this one, so seems isolated to this one. These are Linux nodes.


My storage.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: file-storage
  namespace: env
spec:
  accessModes:
    - ReadWriteMany
  storageClassName: azurefile
  resources:
    requests:
      storage: 25Gi
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-storage
  namespace: env
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: default
  resources:
    requests:
      storage: 25Gi

postgres-storage seems to be fine, it is the file-storage that is being an issue.


Solution

  • This didn't affect us for AKS but it did for other VMs in our Azure Tenant; it seems that there is an issue with the CIFS module not being included in the MS Kernel builds

    Between 6.2.0-1015 and 6.2.0-1016, the CIFS module was moved from fs/cifs/* to fs/smb/client/, fs/smb/common/ and fs/smb/server/*. The inclusion list (root/debian.azure-6.2/control.d/azure.inclusion-list) was not updated for this change, so the module is not included in the linux-modules-6.2.0-1026-azure package.

    I'm not sure why it hasn't affected Kubernetes version 1.27.3; perhaps MS haven't moved that to the 6.2.0.1206 kernel yet?

    A work around has been posted:

    ## Install older kernel
    sudo apt install linux-image-6.2.0-1015-azure
    
    ## Remove newer kernel (select NO when asked)
    sudo apt remove linux-image-6.2.0-1016-azure
    
    ## Reboot
    sudo reboot