I am trying to implement a C++17 portable and efficient Hamming distance function for ORB features, hopefully automatically using SIMD when compiling for both x86 and ARM.
std::bitset
std::bitset<N>
provides a standard way for bitwise operation and count, and it's also superior to __builtin_popcount
. However, it's a container type that owns data, and is not easy to be converted from a 256-bit vector stored as cv::Mat
that is computed by cv::ORB::detectAndCompute
.
This thread asks for converting cv::Mat
to std::bitset<256>
. I don't think memcpy
in its answer is right, since I didn't find the memory layout for std::bitset
in https://en.cppreference.com/w/cpp/utility/bitset. Moreover, std::bitset
constructor does not support to initialize more than sizeof(unsigned long long)
bits.
To avoid copy, my current implementation use a span-like class:
struct ORB_view {
inline static constexpr int length = 256 / sizeof(uint32_t) / 8;
const uint32_t* data;
};
However, this cannot use bitwise operation and popcount directly, and lead to explicit SIMD instructions in implementation, which I would like to avoid by using std::bitset<N>::count()
.
As a result, a non-owning reference providing bitwise operation and count would be much helpful in such a case. The desired features are:
std::bitset
The standard library does not provide it as far as I know.
Firstly, if performance is important, you should stay away from std::bitset
for most purposes. It throws exceptions on invalid inputs and those runtime checks are suboptimal for high-performance applications.
Secondly, if you were able to use C++20, you could use std::popcount
. Prior to C++20, you can use __builtin_popcnt
. You have pointed out portability issues, but those can be overcome with conditional compilation:
#if __cpp_lib_bitops >= 201907L // C++20 std::popcount
#include <bit>
using std::popcount;
#elif defined(__GNUC__) // GCC and clang/LLVM
inline int popcount(unsigned char x) noexcept {
return __builtin_popcount(x);
}
inline int popcount(unsigned short x) noexcept {
return __builtin_popcount(x);
}
inline int popcount(unsigned int x) noexcept {
return __builtin_popcount(x);
}
inline int popcount(unsigned long x) noexcept {
return __builtin_popcountl(x);
}
inline int popcount(unsigned long long x) noexcept {
return __builtin_popcountll(x);
}
#elif defined(_MSC_VER) // MSVC
#include <intrin.h>
__forceinline int popcount(std::uint8_t x) noexcept {
return static_cast<int>(__popcnt16(x));
}
__forceinline int popcount(std::uint16_t x) noexcept {
return static_cast<int>(__popcnt16(x));
}
__forceinline int popcount(std::uint32_t x) noexcept {
return static_cast<int>(__popcnt(x));
}
__forceinline int popcount(std::uint64_t x) noexcept {
return static_cast<int>(__popcnt64(x));
}
#endif
See live example at Compiler Explorer.
With this wrapper (taken from the bitmanip
library), you can call popcount
safely for GCC, MSVC, and clang. The Android native compiler is based on LLVM, so it should also work.
It's very easy to create a a wrapper like:
int popcount_256(const std::uint64_t nums[4]) {
int sum = 0;
for (std::size_t i = 0; i < 4; ++i) {
sum += popcount(nums[i]);
}
return sum;
}
Obviously, this could also be a member function of your ORB_view
. There is no explicit SIMD-code, though clang auto-vectorizes this code.