linux-kernelebpfbpflinux-capabilities

No direct packet access in BPF program with just CAP_BPF?


Up until Linux 5.8 CAP_SYSADMIN was required to load any but the most basic BPF program. The recently introduced CAP_BPF is a welcome addition as it allows to run software leveraging BPF with less privileges.

Certain types of BPF programs can access packet data. The pre-4.7 way of doing it is via bpf_skb_load_bytes() helper. As the verifier got smarter, it became possible to perform "direct packet access", i.e. to access packet bytes by following pointers in the context structure. E.g:

static const struct bpf_insn prog[] = {
    // BPF_PROG_TYPE_SK_REUSEPORT: gets a pointer to sk_reuseport_md (r1).

    // Get packet data pointer (r2) and ensure length >= 2, goto Drop otherwise
    BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,
                offsetof(struct sk_reuseport_md, data)),
    BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1,
                offsetof(struct sk_reuseport_md, data_end)),
    BPF_MOV64_REG(BPF_REG_4, BPF_REG_2),
    BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, 2),
    BPF_JMP_REG(BPF_JGT, BPF_REG_4, BPF_REG_3, /* Drop: */ +4),

    // Ensure first 2 bytes are 0, goto Drop otherwise
    BPF_LDX_MEM(BPF_H, BPF_REG_4, BPF_REG_2, 0),
    BPF_JMP_IMM(BPF_JNE, BPF_REG_4, 0, /* Drop: */ +2),

    // return SK_PASS
    BPF_MOV32_IMM(BPF_REG_0, SK_PASS),
    BPF_EXIT_INSN(),

    // Drop: return SK_DROP
    BPF_MOV32_IMM(BPF_REG_0, SK_DROP),
    BPF_EXIT_INSN()
};

It is required to ensure that the accessed bytes are within bounds explicitly. The verifier will reject the program otherwise.

The program above loads successfully if the caller bears CAP_SYSADMIN. Supposedly, CAP_BPF should suffice as well, but it doesn't (Linux 5.13). Earlier kernels behave similarly. The verifier output follows:

Permission denied
0: (79) r2 = *(u64 *)(r1 +0)
1: (79) r3 = *(u64 *)(r1 +8)
2: (bf) r4 = r2
3: (07) r4 += 2
4: (2d) if r4 > r3 goto pc+4
R3 pointer comparison prohibited
processed 5 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0

I understand that arbitrary pointer comparison is restricted as it reveals kernel memory layout. However, comparing a pointer to a packet data offset by a certain amount with a pointer to the packet end is safe.

I'd like to find a way to load the program without granting CAP_SYSADMIN.

  1. Is there a way to write bounds checks in a way that doesn't trigger pointer comparison error?

    The relevant code is in check_cond_jmp_op(). It looks like one can't get away with pointer comparison, even with the latest kernel version.

  2. If there's no way to write bounds check in a way that keeps verifier happy, I wonder if lifting the limitation is on the roadmap.

  3. As a workaround, I can grant CAP_PERFORM on top of CAP_BPF, removing the "embargo" on pointer comparison. The program loads successfully. I can probably restrict perf_event_open() and other superfluous bits with seccomp. Doesn't feel nice though.

Reproducer.


Solution

  • To make direct packet accesses in your program, you will need CAP_PERFMON in addition to CAP_BPF. I'm not aware of any way around it.

    Why?

    Because of Spectre vulnerabilities, someone able to perform arithmetic on unbounded pointers (i.e., all except stack and map value pointers) can read arbitrary memory via speculative out-of-bounds loads.

    Such operations thus need to be forbidden for unprivileged users. Allowing CAP_BPF users to perform those operations would essentially give read access to arbitrary memory to CAP_BPF. For those reasons, I doubt this limitation will be lifted in the future.