linuxkuberneteslinux-capabilities

running a container with runAsNonRoot and add capabilities


I was trying to run my pod as non root and also grant it some capabilities.
This is my config:

 containers:
    - name: container-name
        securityContext:
          capabilities:
            add: ["SETUID", "SYS_TIME"]
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1001

when I deploy my pod and connect to it I run ps aux and see:

PID   USER     TIME  COMMAND
    1 root      0:32 node bla.js
  205 root      0:00 /bin/bash
  212 root      0:00 ps aux

I then do cat /proc/1/status and see:

CapPrm: 0000000000000000
CapEff: 0000000000000000

Which means I have no capabilities for this container's process.
The thing is that if I remove the runAsNonRoot: true flag from the securityContext I can see I do have multiple capabilities.
Is there a way to run a pod as a non-root and still add some capabilities?


Solution

  • This is the expected behavior. The capabilities are meant to divide the privileges traditionally associated with superuser (root) into distinct units; a non-root user cannot enable/disable such capabilities, that could create a security breach.

    The capabilities feature in the SecurityContext key is designed to manage (either to limit or to expand) the Linux capabilities for the container's context; in a pod run as a root this means that the capabilities are inherited by the processes since these are owned by the root user; however, if the pod is run as a non-root user, it does not matter if the context has those capabilities enabled because the Linux Kernel will not allow a non-root user to set capabilities to a process.

    This point can be illustrated very easily. If you run your container with the key runAsNonRoot set to true and add the capabilities as you did in the manifest shared, and then you exec into the Pod, you should be able to see those capabilities added to the context with the command:

    $ capsh --print
    Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_time,cap_mknod,cap_audit_write,cap_setfcap+i
    Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_time,cap_mknod,cap_audit_write,cap_setfcap
    

    But you will see the CapPrm or CapEff set to x0 in any process run by the user 1001:

    $ ps aux
    USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    1001           1  0.0  0.0   4340   760 ?        Ss   14:57   0:00 /bin/sh -c node server.js
    1001           7  0.0  0.5 772128 22376 ?        Sl   14:57   0:00 node server.js
    1001          21  0.0  0.0   4340   720 pts/0    Ss   14:59   0:00 sh
    1001          28  0.0  0.0  17504  2096 pts/0    R+   15:02   0:00 ps aux
    $ grep Cap proc/1/status
    CapInh: 00000000aa0425fb
    CapPrm: 0000000000000000
    CapEff: 0000000000000000
    CapBnd: 00000000aa0425fb
    CapAmb: 0000000000000000