c++c++11clangllvmcompiler-optimization

Is Clang really this smart?


If I compile the following code with Clang 3.3 using -O3 -fno-vectorize I get the same assembly output even if I remove the commented line. The code type puns all possible 32-bit integers to floats and counts the ones in a [0, 1] range. Is Clang's optimizer actually smart enough to realize that 0xFFFFFFFF when punned to float is not in the range [0, 1], so ignore the second call to fn entirely? GCC produces different code when the second call is removed.

#include <limits>
#include <cstring>
#include <cstdint>

template <class TO, class FROM>
inline TO punning_cast(const FROM &input)
{
    TO out;
    std::memcpy(&out, &input, sizeof(TO));
    return out;
}

int main()
{
    uint32_t count = 0;
  
    auto fn = [&count] (uint32_t x) {
        float f = punning_cast<float>(x);
        if (f >= 0.0f && f <= 1.0f)
            count++;
    };
    
    for(uint32_t i = 0; i < std::numeric_limits<uint32_t>::max(); ++i)
    {
        fn(i);
    }
    fn(std::numeric_limits<uint32_t>::max()); //removing this changes nothing
  
    return count;
}

See here


Solution

  • Yes, it looks like Clang really is this smart.

    Test:

    #include <limits>
    #include <cstring>
    #include <cstdint>
    
    template <class TO, class FROM>
    inline TO punning_cast(const FROM &input)
    {
        TO out;
        std::memcpy(&out, &input, sizeof(TO));
        return out;
    }
    
    int main()
    {
        uint32_t count = 0;
    
        auto fn = [&count] (uint32_t x) {
            float f = punning_cast<float>(x);
            if (f >= 0.0f && f <= 1.0f)
                count++;
        };
    
        for(uint32_t i = 0; i < std::numeric_limits<uint32_t>::max(); ++i)
        {
            fn(i);
        }
    #ifdef X
        fn(0x3f800000); /* 1.0f */
    #endif
    
        return count;
    }
    

    Result:

    $ c++ -S -DX -O3 foo.cpp -std=c++11 -o foo.s
    $ c++ -S -O3 foo.cpp -std=c++11 -o foo2.s
    $ diff foo.s foo2.s
    100d99
    <   incl    %eax
    

    Observe that Clang has converted the call to fn(0x3f800000) into simply an increment instruction, since the value decodes to 1.0. This is correct.

    My guess is that Clang is tracing the function calls because they only involve constants, and that Clang is capable of tracing memcpy through type-punning (probably by simply emulating its effect on the constant value).