I know a little something about branch prediction. This happens at the CPU and has nothing to do with compilation. Although you might be able to tell the compiler if one branch is more likely than the other e.g. in C++20 via [[likely]]
and [[unlikely]]
(see cppreference) this is seperate to the branch prediction the CPU performs (see Can I improve branch prediction with my code?).
As far as I know when I have e.g. a loop (with an exit condition) the CPU will predict that the exit condition will not be met and try to perform some operations inside the loop even if the condition has not been checked yet. If the CPU predicts correct it saves some time and everything is fine. However what happens if it is not able to predict it correct? I know that this will be a performance hit but I don't know if some already done operations are discareded or reversed or just how it handles it.
Now I have come up with two simple examples. The first one (if we ignore that the compiler might just calculate the sum at compile time and I asume that no optimisations occure) should be very easy to predict for the CPU. The loop condition will be the same for the whole time and the condition in the loop only switches once. This means that the prediction will get us a nice performance boost and even if it fails a few times, adding a number can easily be reversed.
In the second example the exit condition is again easy to predict. In the loops body I allocate a new int
array via malloc
. Note that I don't free it on purpose since I want the allocation to succeed for a long time so the CPU predicts this success. At some time the allocation will fail when I run out of memory (I didn't calculate the total memory consumption and suppose that memory will not be moved to disk) or some other error occurs. This means that ptr
will be NULL
and dereferencing it is UB. It is not defined what happens, it could just be a no-op, crash my program or cause my PC to fly away. Therefore I conclude that the CPU can not just undo that and I am wondering what happens.
#include <stdlib.h>
#define VERSION 1
#if VERSION == 1
int main() {
size_t sum = 0ull;
for (size_t i = 0ull, max = 1'000ull; i < max; ++i) {
if (i < (max / 2)) {
sum += 2 * i;
}
else {
sum += i;
}
}
return 0;
}
#else
int main() {
int* ptr = NULL;
for (size_t i = 0ull, max = 1'000'000ull; i < max; ++i) {
ptr = (int*)malloc((sizeof * ptr) * 1'000ull);
if (ptr) {
*ptr = 1234;
}
// free(ptr)
}
return 0;
}
#endif
Branch prediction is the CPUs task and UB obviously exists in C as well as in C++ so I think an answer to this doesn't require one specific language and my code should work in both languages. If the chosen language however makes a difference I am more interested in C++ than in C but would be happy for any answers.
However what happens if it is not able to predict it correct?
It wastes time & energy doing work that has to be thrown away.
This means that ptr will be NULL and dereferencing it is UB.
No, the language doesn't work that way. The compiler must honor the guard (if-statement) around that dereference.
The compiler must honor the C++ language, full stop! If the compiler generates a speculative load of the null pointer (possible on some ISAs like Itanium), that must be conditional and ignorable, because the program explicitly said not to.
Meanwhile, the hardware must honor the ISA, period! If the hardware generates a speculative load of the null pointer, that also must be conditional and ignorable, because the machine code program explicitly said not to.
via [[likely]] and [[unlikely]] … this is seperate to the branch prediction the CPU performs
Code path hints in the C++ language do not necessarily translate into branch prediction hints in machine code.
Many ISAs do not have (or their implementations do not use) machine code branch direction hints. This is because hardware branch prediction has gotten so good that it is done very early and doesn't need the hint. In order to use the hint, the instruction has to decoded, which happens later than we'd like in processor stages for prediction.
What the C++ compiler can do with those hints though is rearrange the machine code so that the likely path is straight and contiguous and the unlikely paths are relocated elsewhere, out of the way of the straight path.