ebpfbpfxdp-bpf

How eBPF stack works


I am currently creating a XDP program where I have a bunch of function calls and, as such, the stack gets fuller as the program is executed. However, I am confused as to how the stack size is measured. For example, lets say that I call the following function:

function_test(&value_a);

And value_a is the address of an instance of the following structure: (I'm skipping value5 to value99)

struct struct_test {
   __u64 value;
   __u64 value1;
   __u64 value2;
   __u64 value3;
   __u64 value4;
   ...
   __u64 value100;
};

And, even if I pass the address of an instance of this structure, eBPF verifier announces that the program exceeds the limit of the stack. Why is that?

I'm aware that using per-cpu array might be a good way to store these large values that can be passed between functions, but I'd prefer to have as little lookups as possible.

UPDATE: another question. In the following example:

function_test(&value_a);
function_test(&value_b);

where value_a and value_b are instances of a shorter struct (for example, containing value1 to value50), the verifier detects that it is reaching the limit of the stack, as if the values from the first call aren't unstacked before doing the second call. Why is eBPF stack working like that?

I was thinking that maybe it could be a problem caused by the fact that I was doing these call inside a function called by bpf_loop, but it seems that in the main XDP program this problem also happens.


Solution

  • However, I am confused as to how the stack size is measured.

    It all comes down to the generated bytecode. The compiler generates bytecode including stack accesses. You can think of the BPF stack as an array of typically 512 bytes where the stack frame pointer points to the end and always lives in the R10 register.

    So when you inspect your BPF program with something like llvm-objdump -d <elf-file> you will see instructions such as *(u64 *)(r10 - 16) = r1. The compiler determines which bits of the stack it is going to use and when it can re-cycle entries. But it acts more like a software stack than the typical hardware implementation.

    The verifier can simply see if any instruction attempts to store anything higher than R10-512 in which case it will throw an error. I believe clang also throws a warning or error when it detects it will go over the stack size.