The man page for user_namespaces(7) says:
The child process created by clone(2) with the CLONE_NEWUSER flag starts out with a complete set of capabilities in the new user namespace. Likewise, a process that creates a new user namespace using unshare(2) or joins an existing user namespace using setns(2) gains a full set of capabilities in that namespace.
Unfortunately, the man page does not clarify which capabilities set (or sets) will be affected: one or more of the effective caps set, the permitted caps set, the inheritable caps set, etc. So my question here is: which capabilities sets will be affected by clone(2), unshare(2), and setns(2)?
Note: the example section of user_namespaces(7) seems to indicate that the effective and permitted capabilities set will be fully enabled, while the inherited capabilities are all dropped. However, there is no clear indication that this is fact the implemented behavior. Additionally, there is no indication whether ambient caps are affected or not; and I assume that the bounding caps are unaffected, not least as only cap dropping is possible on bounding caps.
In order to get an idea of what setns(2) and unshare(2) might do to capabilities, I've created the following tiny Python 3 scripts. Make sure to install the package nsenter
and unshare
(pip3 install nsenter
, ...) before any attempt to run them.
# usernscaps.py: dump all capabilities sets of this process
# when entering a specific (grand)child user namespace.
from nsenter import Namespace
import sys
def dumpcaps(s):
print(s)
with open('/proc/self/status', 'r') as st:
for line in st:
if line.startswith('Cap'):
print(line.rstrip())
if len(sys.argv) != 2:
print('usage: usernscaps.py <PID>')
exit(1)
dumpcaps('initial:')
try:
with Namespace('/proc/%d/ns/user' % int(sys.argv[1]), 'user'):
entered = True
dumpcaps('after setns:')
except PermissionError:
# Switching back to our original user namespace isn't allowed, so ignore the exception.
try:
entered
except NameError:
print('no permission to enter user namespace')
As an ordinary unprivileged user, let us create a new user namespace which will be owned by us, and keep it open with a sleeping process (note: we put it to the background):
unshare -U bash -c "readlink /proc/self/ns/user && sleep infinity" &
Next, run the Python script usernscaps.py
from above, and tell it to enter our newly created user space using setns(2), then finally dump the capability sets:
python3 usernscaps.py $(lsns -t user | grep "infinity" | awk '{ print $4 }')
This gives, even for our unprivileged user and process, after setns(2):
initial:
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
after setns:
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
This seems to indicate that a setns(2) in fact gives a full set of capabilities not only to the effective caps, but also the permitted caps (which makes sense, as the effective caps must be bounded by the permitted caps at any time). It doesn't seem to top up the inherited caps, though.
Similar to the previous script, but unshare(2)ing this time.
# usernsunsharecaps.py: dump all capabilities sets of this process
# upon unsharing the user namespace.
import unshare
import sys
def dumpcaps(s):
print(s)
with open('/proc/self/status', 'r') as st:
for line in st:
if line.startswith('Cap'):
print(line.rstrip())
dumpcaps('initial:')
unshare.unshare(unshare.CLONE_NEWUSER)
dumpcaps('after unshare:')
Simply run it python3 usernsunsharecaps.py
:
initial:
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
after unshare:
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
So, this also gives full permitted and effective capabilities within the new user namespace after unsharing.