c++algorithm compiler-optimization micro-optimization likely-unlikely

Why doesn't the C++ standard library utilize likely/unlikely attributes?

When reading the implementation of the Standard Library algorithms in Visual Studio, they don't use the [[likely]]/[[unlikely]] attributes at all. To me, a textbook example of where [[unlikely]] should be used would be for example std::find_if(...), as Microsoft has implemented like this:

// Note that some noise are removed from the code
template <class _InIt, class _Pr>
_InIt find_if(_InIt _First, const _InIt _Last, _Pr _Pred) {
    for (; _First != _Last; ++_First) {
        if (_Pred(*_First)) {
            break;
        }
    }
    return _First;
}

As mentioned, find_if(...) is a textbook example of where the true branch of the if-clause is unlikely, as most often it iterates a couple of elements before the predicates validate to true, and therefore [[unlikely]] would be an optimization opportunity. Is there any reason Microsoft isn't using the [[unlikely]] attribute here?

Solution

[[likely]] and [[unlikely]] are only a good idea when tuning for a specific use-case. You'd have to know that e.g. it's not common for the first element to match, and that iterator range is more than 0 or 1 elements most of the time. A standard library header for a relatively simple function is the opposite of a good place to use these attributes. (Maybe somewhere deep in a sort function could make sense.)

Breaking out of a loop is pretty "obvious" to the compiler already: their heuristics for guessing whether a branch is likely or unlikely already know that if()break branches are true at most once per run of the whole loop.

Also, that just informs how to lay out the code, not directly hint the CPU hardware predictor (only even possible on a few rare ISAs, other than for the fallback to static prediction which isn't even a thing on modern x86 anymore). See Is it possible to tell the branch predictor how likely it is to follow the branch? (no on x86 other than Pentium 4, and no on ARM or AArch64, so any of the ISAs MSVC targets).

And there aren't that many choices for breaking out of a loop. With an if/else, you can make one path a straight line (zero taken branches, and ideal I-cache locality) while the other has to jump to another block and back (which you put past the end of the function for example.) See Can I improve branch prediction with my code? for a more recent overview / summary of the point of likely/unlikely stuff with links to more.

I guess if you know one way to leave a loop is more likely than the other, you might prefer to the that branch at the bottom of the loop, as the bottom of the do{}while() asm loop structure so the other branch can be (almost) never taken, and depending on the branch prediction hardware, maybe pollute things for other branches less?

If there's no entry for a branch in a CPU that uses static prediction as a fallback, if the forward-not-taken prediction was correct, the CPU might not evict any other prediction entry to make room for this one. Only doing so if / when it does first mispredict. So if a compiler knew which of multiple possible exits a loop was most likely to take, and could cheaply rotate it so that one was at the bottom, it might be a small win. But we don't know that in an STL function.

(Again, on modern x86 with IT-TAGE predictors this isn't a factor. There's no static prediction for conditional branches, since the dynamic predictor will produce a guess for every branch. It might not have a target address until after decode, and not at all for indirect branches if it hasn't seen the branch execute recently, but if()break will be a direct conditional branch. *Why did Intel change the static branch prediction mechanism over these years?)

Would [[unlikely]] on the if()break be hinting the compiler that it was more likely to exit the loop via the other termination condition, by getting to the end of the range without finding any matches? We definitely don't want to do that, it might not be true in some use-cases. Better to let the compiler apply its normal heuristics for guessing which code-paths to favour when inlining this loop into a caller.

TL:DR: there's basically nothing to gain.

Did you try it and find better asm in an optimized build? (With MSVC, since this is from MSVC's library). If it doesn't make it worse for other use-cases, then consider filing a missed-optimization bug report (on MS's forums? I'm not sure how they take reports like that). Most likely the compiler devs would rather tweak the compiler's heuristics to improve code-gen for all similar loops, rather than cluttering the standard library with [[likely]] or [[unlikely]].