[SOLVED] What does the endbr64 instruction actually do?

What does the endbr64 instruction actually do?

I've been trying to understand assembly language code generated by GCC and frequently encounter this instruction at the start of many functions including _start(), but couldn't find any guide explaining its purpose:

31-0000000000001040 <_start>:
32:    1040:    f3 0f 1e fa             endbr64 
33-    1044:    31 ed                   xor    ebp,ebp

Solution

TL:DR: endbr64 can be the target of an indirect branch without faulting.
Any other instruction will fault if jumped to with an indirect jump, if CET is enabled.

Putting it at the start of every function allows them to be called through function pointers. (Which is how calls into shared libraries always work.)

endbr64 (and endbr32) are a part of Intel's Control-Flow Enforcement Technology (CET) (see also Intel Software Developer Manual, Volume 1, Chapter 18).

Intel CET offers hardware protection against Return-oriented Programming (ROP) and Jump/Call-oriented Programming (JOP/COP) attacks, which manipulate control flow in order to re-use existing code for malicious purposes.

Its two major features are

a shadow stack for tracking return addresses and
indirect branch tracking, which endbr64 is a part of.

While CET is just slowly becoming available with the current processor generation, it is already supported as of GCC 8, which inserts endbrXX instructions by default. The opcode is chosen to be a no-op on older processors, such that the instruction is ignored if CET is not supported; the same happens on CET-capable processors where indirect branch tracking is disabled.

So what does endbr64 do?

Preconditions:

CET must be enabled by setting the control register flag CR4.CET to 1.
The appropriate flags for indirect branch tracking in the IA32_U_CET (user mode) or IA32_S_CET (supervisor mode) MSRs are set.

The CPU sets up a small state machine which tracks the type of the last branch. Take the following example:

some_function:
    mov rax, qword [vtable+8]
    call rax
    ...

check_login:
    endbr64
    ...
authenticated:
    mov byte [is_admin], 1
    ...
    ret

Let's now briefly look at two scenarios.

No attack:

some_function retrieves the address of the virtual method check_login from the virtual method table vtable and calls it.
Since this is an indirect call, the CET state machine is activated and set to trigger on the next instruction (TRACKER = WAIT_FOR_ENDBRANCH).
The next instruction is endbr64, so the indirect call is considered "safe" and execution continues (the endbr64 still behaves as a no-op). The state machine is reset (TRACKER = IDLE).

Attack:
An attacker somehow managed to manipulate vtable such that vtable+8 now points to authenticated.

some_function retrieves the address of authenticated from the virtual method table vtable and calls it.
Since this is an indirect call, the CET state machine is activated and set to trigger on the next instruction (TRACKER = WAIT_FOR_ENDBRANCH).
The next instruction is mov byte [is_admin], 1, not the expected endbr64 instruction. The CET state machine infers that control flow was manipulated and raises a #CP fault, terminating the program.

Without CET, the control flow manipulation would have gone unnoticed and the attacker would have obtained admin privileges.

In summary, the indirect branch tracking feature of Intel CET ensures that indirect calls and jumps can only redirect to functions which start with an endbr64 instruction.

Note that this does not ensure that the right function is called - if an attacker changes control flow to jump to a different function which starts with endbr64 as well, the state machine won't complain and keep executing the program. However, this still greatly reduces the attack surface, as most JOP/COP attacks target instructions mid-function (or even jump right "into" instructions).