I was trying to run my pod as non root and also grant it some capabilities.
This is my config:
containers:
- name: container-name
securityContext:
capabilities:
add: ["SETUID", "SYS_TIME"]
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1001
when I deploy my pod and connect to it I run ps aux
and see:
PID USER TIME COMMAND
1 root 0:32 node bla.js
205 root 0:00 /bin/bash
212 root 0:00 ps aux
I then do cat /proc/1/status
and see:
CapPrm: 0000000000000000
CapEff: 0000000000000000
Which means I have no capabilities for this container's process.
The thing is that if I remove the runAsNonRoot: true
flag from the securityContext
I can see I do have multiple capabilities.
Is there a way to run a pod as a non-root and still add some capabilities?
This is the expected behavior. The capabilities are meant to divide the privileges traditionally associated with superuser (root) into distinct units; a non-root user cannot enable/disable such capabilities, that could create a security breach.
The capabilities
feature in the SecurityContext
key is designed to manage (either to limit or to expand) the Linux capabilities for the container's context; in a pod run as a root this means that the capabilities are inherited by the processes since these are owned by the root user; however, if the pod is run as a non-root user, it does not matter if the context has those capabilities enabled because the Linux Kernel will not allow a non-root user to set capabilities to a process.
This point can be illustrated very easily. If you run your container with the key runAsNonRoot
set to true
and add the capabilities as you did in the manifest shared, and then you exec into the Pod, you should be able to see those capabilities added to the context with the command:
$ capsh --print
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_time,cap_mknod,cap_audit_write,cap_setfcap+i
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_sys_chroot,cap_sys_time,cap_mknod,cap_audit_write,cap_setfcap
But you will see the CapPrm
or CapEff
set to x0 in any process run by the user 1001:
$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1001 1 0.0 0.0 4340 760 ? Ss 14:57 0:00 /bin/sh -c node server.js
1001 7 0.0 0.5 772128 22376 ? Sl 14:57 0:00 node server.js
1001 21 0.0 0.0 4340 720 pts/0 Ss 14:59 0:00 sh
1001 28 0.0 0.0 17504 2096 pts/0 R+ 15:02 0:00 ps aux
$ grep Cap proc/1/status
CapInh: 00000000aa0425fb
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 00000000aa0425fb
CapAmb: 0000000000000000