Why is unprivileged recursive unshare(CLONE_NEWUSER) not permitted?

I'm on Ubuntu 17.04.

Single unprivilleged unshare of mount namespace works. You can try using unshare(1) command:

$ unshare -m -U /bin/sh
#

However unshare within unshare is not permitted:

$ unshare -m -U /bin/sh
# unshare -m -U /bin/sh
unshare: Operation not permitted
#

Here is a C program that will basically do the same:

#define _GNU_SOURCE
#include <stdio.h>
#include <sched.h>
#include <sys/mount.h>
#include <unistd.h>

int
main(int argc, char *argv[])
{
    if(unshare(CLONE_NEWUSER|CLONE_NEWNS) == -1) {
        perror("unshare");
        return -1;
    }
    if(unshare(CLONE_NEWUSER|CLONE_NEWNS) == -1) {
        perror("unshare2");
        return -1;
    }
    return 0;
}

Why it's not permitted? Where I can find documentation about this? I failed to find this information in unshare or clone man page and in kernel unshare documentation.

Is there a system setting that would allow this?

What I want to achieve:

First unshare: I want to mask few binaries on system with my own versions.

Second unshare: unprivilleged chroot.

Solution

I'm somewhat guessing here, but I think that the reason is the UID mapping. In order to perform it, certain conditions must be met (from the user_namespaces man page):

   In  order  for  a process to write to the /proc/[pid]/uid_map (/proc/[pid]/gid_map) file, all of the following require‐
   ments must be met:

   1. The writing process must have the CAP_SETUID (CAP_SETGID) capability in the user namespace of the process pid.

   2. The writing process must either be in the user namespace of the process pid or be in the parent  user  namespace  of
      the process pid.

   3. The mapped user IDs (group IDs) must in turn have a mapping in the parent user namespace.

I believe what happens is that the first time you run, the mapping matches that of the parent UID. The second time, however, it does not, and this fails the system call.

From the unshare(2) manual page:

   EPERM  CLONE_NEWUSER was specified in flags, but either the effective user ID or the effective group ID of  the  caller
          does not have a mapping in the parent namespace (see user_namespaces(7)).