c++for-loopclang-tidy

What is a loop with an ID-dependent backward branch?


I am trying to understand this clang-tidy warning: altera-id-dependent-backward-branch that seems to be triggered by this loop.

        for(; first != current; ++first)

The example I have is this code, that looks almost exactly as a perfectly reasonable implementation of std::uninitialized_fill_n.

The static analyzer complains that:

error: backward branch (for loop) is ID-dependent due to variable reference to 'current' and may cause performance degradation [altera-id-dependent-backward-branch,-warnings-as-errors]
                        for(; current != first; ++first) {
uninitialized_fill_n(ForwardIt first, Size n, T const& v) {
    ForwardIt current = first;
    try {
        for(; n > 0; ++current, --n) {
            new (std::addressof(*current)) T(v);
        }
        return current;
    } catch(...) {
        for(; first != current; ++first) {  // clang-tidy error here
            std::addressof(*first)->~T();
        }
        throw;
    }
}

I tried different ways to rewrite the code (for example making the loop backward), to suppress this warning but I couldn't. Is there a standard way to rewrite the fallback (catch) loop in a way that is ok with altera-id-dependent-backward-branch?

I am using clang-tidy 13.0.0.


I also found another instance of the same warning in my code base (this is an implementation of a hash function, by Howard Hinnart):

auto fnv1a(void const* key, std::size_t len, std::size_t h) noexcept {
    auto const *p = static_cast<unsigned char const*>(key);
    unsigned char const* const e = p + len;
    for(; p < e; ++p) {  // warning here
        h = (h ^ *p) * 1099511628211U; // prime
    }
    return h;
} 

Solution

  • The altera-id-dependent-backward-branch warning originates from clang-tidy's FPGA-specific rules. Specifically, when targeting FPGAs, the compiler transforms C++ into hardware, and hardware has different performance characteristics compared to traditional CPUs.

    The warning you're seeing is because the branch in the loop depends on an induction variable (current in your first case and p in the second case). When synthesizing loops into hardware, branches that depend on induction variables may result in longer pipeline stalls. The exact impact is hardware-specific, but for high-performance loops, the impact can be non-trivial.

    To address the warning, the loop's termination condition needs to be made independent of the induction variable.

    Let's start with the first example:

    for(; first != current; ++first) {  // clang-tidy error here
        std::addressof(*first)->~T();
    }
    

    You can determine the number of iterations in advance and then use a fixed loop bound:

    auto distance = std::distance(first, current);
    for(auto i = 0; i < distance; ++i, ++first) {
        std::addressof(*first)->~T();
    }
    

    For the second example:

    for(; p < e; ++p) {  // warning here
        h = (h ^ *p) * 1099511628211U; // prime
    }
    

    You can rewrite it as:

    auto distance = e - p;
    for(std::size_t i = 0; i < distance; ++i, ++p) {
        h = (h ^ *p) * 1099511628211U; // prime
    }
    

    This transformation ensures that the loop bounds are fixed and do not depend on an induction variable. As a result, the FPGA synthesizer should have an easier time creating an efficient pipeline.

    That said, you should always test the synthesized results to verify that the performance meets your expectations. Depending on the specific hardware and the loop's complexity, there may still be performance trade-offs.