Docker supports user namespace remapping, so that the user namespace is completely separated from the host.
The current default behavior ensures that containers get their own user and group management, i.e. their own version of /etc/passwd
and /etc/group
, but container processes are run under the same identical UIDs on the host system. This means if your container runs with UID 1 (root), it will also run as root on the host. By the same token, if your container has user "john" with UID 1001 installed and starts its main process with that user, on the host it will also run with UID 1001, which might belong to user "Will" and could also have admin rights.
To make user namespace isolation complete, one needs to enable remapping, which maps the UIDs in the container to different UIDs on the host. So, UID 1 on the container would be mapped to a "non-privileged" UID on the host.
Is there any support in Kubernetes for this feature to be enabled on the underlying Container Runtime? Will it work out of the box without issues?
So, it's not supported yet like Docker as per this (as alluded in the comments) and this.
However, if you are looking at isolating your workloads there are other alternatives (it's not the same, but the options are pretty good):
You can use Pod Security Policies and specifically you can use RunAsUser, together with AllowPrivilegeEscalation=false. Pod Security Policies can be tied to RBAC so you can restrict how users run their pods.
In other words, you can force your users to run pods only as 'youruser' and disable the privileged
flag in the pod securityContext
. You can also disable sudo
and in your container images.
Furthermore, you can drop Linux Capabilities, specifically CAP_SETUID
. And even more advanced use a seccomp profile, use SElinux or an Apparmor profile.
Other alternatives to run untrusted workloads (in alpha as of this writing):