gomountlinux-namespacesunshare

Filesystem mounted in a mount namespace is visible in root namespace


I have a program that creates a mount namespace with and unshare(CLONE_NEWNS) syscall, then mounts a filesystem at /path/to/my/mount using a mount syscall with mountflags=0. The mount succeeds and I can see the filesystem at /path/to/my/mount from a shell running in the mount namespace, but the mount also shows up in the root mount namespace - i.e. from an ordinary shell running on my computer I can see the contents of /path/to/my/mount. I was expecting the mount to only be visible from within the mount namespace.

The Go code that I'm using is this:

// lock this goroutine to a single OS thread because namespaces are thread-local
runtime.LockOSThread()
defer runtime.UnlockOSThread()

// switch to a new mount namespace
err := unix.Unshare(unix.CLONE_FS)
if err != nil {
    return fmt.Errorf("error entering new mount namespace: %w", err)
}

mountopts := fmt.Sprintf("lowerdir=%s,upperdir=%s,workdir=%s", lower, upper, work)
err = unix.Mount("overlay", target, "overlay", 0, mountopts)
if err != nil {
    return fmt.Errorf("error mounting overlay filesystem: %w", err)
}

How can I make it so that the filesystem mounted within the mount namespace is private to that namespace?

Is this to do with mount propagation between mount namespaces? I tried changing / in the mount namespace to be a private mount by inserting the following between the unshare and mount syscalls above:

// make the root filesystem in this new namespace private
err = unix.Mount("ignored", "/", "ignored", unix.MS_PRIVATE, "ignored")
if err != nil {
    return fmt.Errorf("error making root filesystem private")
}

This didn't work. / did become private in the mount namespace (according to /proc/self/mountinfo), but it also became private in the root mount namespace, and nevertheless, the mount I created at /path/to/my/mount was still visible in both namespace.


Solution

  • The solution was to add the MS_REC flag to the remount of "/". The relevant syscall, which must come after unshare, before mount of new filesystem, is:

    unix.Mount("ignored", "/", "ignored", unix.MS_PRIVATE|unix.MS_REC, "ignored")
    

    The first, third, and fifth param in the above are ignored by linux per the man page. The "ignored" strings above are just to indicate this to people reading the code and can be replaced with any string.

    Full working solution:

    // lock this goroutine to a single OS thread because namespaces are thread-local
    runtime.LockOSThread()
    defer runtime.UnlockOSThread()
    
    // switch to a new mount namespace
    err := unix.Unshare(unix.CLONE_NEWNS | unix.CLONE_FS)
    if err != nil {
        return fmt.Errorf("error unsharing mounts: %w", err)
    }
    
    // make the root filesystem in this new namespace private, which prevents the
    // mount below from leaking into the parent namespace
    // per the man page, the first, third, and fifth arguments below are ignored
    err = unix.Mount("ignored", "/", "ignored", unix.MS_PRIVATE|unix.MS_REC, "ignored")
    if err != nil {
        return fmt.Errorf("error making root filesystem private")
    }
    
    // mount an overlay filesystem
    // sudo mount -t overlay overlay -olowerdir=$(pwd)/lower,upperdir=$(pwd)/upper,workdir=$(pwd)/work $(pwd)/merged
    err = unix.Mount("overlay", target, "overlay", 0, mountopts)
    if err != nil {
        return fmt.Errorf("error mounting overlay filesystem: %w", err)
    }