dockerfuse

Why is granting the SYS_ADMIN privilege for a Docker container "bad"?


I am running into issues with security teams because engineering teams want to FUSE mount a filesystem in Docker, however, to do that, the "--cap-add SYS_ADMIN" flag must be set. Security is not allowing this flag.

I have found a lot of articles on the Internet regarding the "--cap-add SYS_ADMIN" flag during the Docker runtime as something to be cautious of because "SYS_ADMIN by itself grants quite a big part of the capabilities and it could potentially present more attack surface."

However, I cannot find anything which specifically states what these capabilities are and what "attack surfaces" they present?

What exactly does the SYS_ADMIN flag grant?

What is a practical security risk that is presented by setting this flag?


Solution

  • This is basically root access to the host. From the capabilities man page:

    CAP_SYS_ADMIN Note: this capability is overloaded; see Notes to kernel developers, below.

    • Perform a range of system administration operations including: quotactl(2), mount(2), umount(2), pivot_root(2), setdomainname(2);
    • perform privileged syslog(2) operations (since Linux 2.6.37, CAP_SYSLOG should be used to permit such operations);
    • perform VM86_REQUEST_IRQ vm86(2) command;
    • perform IPC_SET and IPC_RMID operations on arbitrary System V IPC objects;
    • override RLIMIT_NPROC resource limit;
    • perform operations on trusted and security Extended Attributes (see xattr(7));
    • use lookup_dcookie(2);
    • use ioprio_set(2) to assign IOPRIO_CLASS_RT and (before Linux 2.6.25) IOPRIO_CLASS_IDLE I/O scheduling classes;
    • forge PID when passing socket credentials via UNIX domain sockets;
    • exceed /proc/sys/fs/file-max, the system-wide limit on the number of open files, in system calls that open files (e.g., accept(2), execve(2), open(2), pipe(2));
    • employ CLONE_* flags that create new namespaces with clone(2) and unshare(2) (but, since Linux 3.8, creating user namespaces does not require any capability);
    • call perf_event_open(2);
    • access privileged perf event information;
    • call setns(2) (requires CAP_SYS_ADMIN in the target namespace);
    • call fanotify_init(2);
    • call bpf(2);
    • perform privileged KEYCTL_CHOWN and KEYCTL_SETPERM keyctl(2) operations;
    • perform madvise(2) MADV_HWPOISON operation;
    • employ the TIOCSTI ioctl(2) to insert characters into the input queue of a terminal other than the caller's controlling terminal;
    • employ the obsolete nfsservctl(2) system call;
    • employ the obsolete bdflush(2) system call;
    • perform various privileged block-device ioctl(2) operations;
    • perform various privileged filesystem ioctl(2) operations;
    • perform privileged ioctl(2) operations on the /dev/random device (see random(4));
    • install a seccomp(2) filter without first having to set the no_new_privs thread attribute;
    • modify allow/deny rules for device control groups;
    • employ the ptrace(2) PTRACE_SECCOMP_GET_FILTER operation to dump tracee's seccomp filters;
    • employ the ptrace(2) PTRACE_SETOPTIONS operation to suspend the tracee's seccomp protections (i.e., the PTRACE_O_SUSPEND_SECCOMP flag);
    • perform administrative operations on many device drivers.
    • Modify autogroup nice values by writing to /proc/[pid]/autogroup (see sched(7)).