c++c++17bitwise-operatorssimdstd-bitset

Is there a non-owning reference similar to std::bitset to provide bitwise operation and count for data in other container?


I am trying to implement a C++17 portable and efficient Hamming distance function for ORB features, hopefully automatically using SIMD when compiling for both x86 and ARM.

Problem with std::bitset

std::bitset<N> provides a standard way for bitwise operation and count, and it's also superior to __builtin_popcount. However, it's a container type that owns data, and is not easy to be converted from a 256-bit vector stored as cv::Mat that is computed by cv::ORB::detectAndCompute.

This thread asks for converting cv::Mat to std::bitset<256>. I don't think memcpy in its answer is right, since I didn't find the memory layout for std::bitset in https://en.cppreference.com/w/cpp/utility/bitset. Moreover, std::bitset constructor does not support to initialize more than sizeof(unsigned long long) bits.

Problem with my implementation

To avoid copy, my current implementation use a span-like class:

struct ORB_view {
  inline static constexpr int length = 256 / sizeof(uint32_t) / 8;
  const uint32_t* data;
};

However, this cannot use bitwise operation and popcount directly, and lead to explicit SIMD instructions in implementation, which I would like to avoid by using std::bitset<N>::count().

What I expect

As a result, a non-owning reference providing bitwise operation and count would be much helpful in such a case. The desired features are:

The standard library does not provide it as far as I know.


Solution

  • Firstly, if performance is important, you should stay away from std::bitset for most purposes. It throws exceptions on invalid inputs and those runtime checks are suboptimal for high-performance applications.

    Secondly, if you were able to use C++20, you could use std::popcount. Prior to C++20, you can use __builtin_popcnt. You have pointed out portability issues, but those can be overcome with conditional compilation:

    #if __cpp_lib_bitops >= 201907L // C++20 std::popcount
    #include <bit>
    using std::popcount;
    #elif defined(__GNUC__) // GCC and clang/LLVM
    inline int popcount(unsigned char x) noexcept {
        return __builtin_popcount(x);
    }
    inline int popcount(unsigned short x) noexcept {
        return __builtin_popcount(x);
    }
    inline int popcount(unsigned int x) noexcept {
        return __builtin_popcount(x);
    }
    inline int popcount(unsigned long x) noexcept {
        return __builtin_popcountl(x);
    }
    inline int popcount(unsigned long long x) noexcept {
        return __builtin_popcountll(x);
    }
    #elif defined(_MSC_VER) // MSVC
    #include <intrin.h>
    __forceinline int popcount(std::uint8_t x) noexcept {
        return static_cast<int>(__popcnt16(x));
    }
    __forceinline int popcount(std::uint16_t x) noexcept {
        return static_cast<int>(__popcnt16(x));
    }
    __forceinline int popcount(std::uint32_t x) noexcept {
        return static_cast<int>(__popcnt(x));
    }
    __forceinline int popcount(std::uint64_t x) noexcept {
        return static_cast<int>(__popcnt64(x));
    }
    #endif
    

    See live example at Compiler Explorer.

    With this wrapper (taken from the bitmanip library), you can call popcount safely for GCC, MSVC, and clang. The Android native compiler is based on LLVM, so it should also work.

    It's very easy to create a a wrapper like:

    int popcount_256(const std::uint64_t nums[4]) {
        int sum = 0;
        for (std::size_t i = 0; i < 4; ++i) {
            sum += popcount(nums[i]);
        }
        return sum;
    }
    

    Obviously, this could also be a member function of your ORB_view. There is no explicit SIMD-code, though clang auto-vectorizes this code.