I have an array of up to 32 rows (often less) of uint32_t values that are essentially just bitmaps - so a total of 1024 bits maximum. The data in these <=32 uint32_ts changes frequently (multiple times/millisecond).
For a vectorized algorithm I'm running, I need to extract some of the data from these bitmaps and rearrange the bits into the format my algorithm expects. The positions of the data I need to extract is known at runtime and will not change during the run. I want to find a way - even if it's a bit computationally heavy, as long as it's not memory-heavy I can frontload it - to compute some mapping function that will enable me to quickly extract the data into the format my algorithm expects, so I can run it as frequently as possible to keep up with the data.
The mapping looks something like this, presented in uint8_t binary format below for simplicity.
Input data (nominally up to 32 uint32_ts, presented here as 3 uint8_ts):
abcd efgh
ijkl mnop
qrst uvwx
I want to extract some of these, and create something like the following output data:
00000adu
00000fgx
Here, I'm only extracting 6 of the 24 bits. As a note, the data I extract will always:
Obviously there's the naive approach, but it doesn't operate very quickly - I'm hoping to run this as many times a second as possible.
#define EXTRACT_PAIRS_COUNT 3
const uint8_t colExtractLeft[EXTRACT_PAIRS_COUNT] = {7, 4, 3}
const uint8_t colExtractRight[EXTRACT_PAIRS_COUNT] = {2, 1, 0}
const uint8_t rowExtract[EXTRACT_PAIRS_COUNT] = {0, 0, 2}
uint8_t input[3] = {abcdefgh, ijklmnop, qrstuvwx};
while (inputChanges) {
uint8_t outputLeft = 0;
uint8_t outputRight = 0;
for (size_t i=0; i<EXTRACT_PAIRS_COUNT; i++) {
outputLeft |= ((input[rowExtract[i]] >> colExtractLeft[i]) & 0x1) << i;
outputRight |= ((input[rowExtract[i]] >> colExtractRight[i]) & 0x1) << i;
}
runAlgorithm(outputLeft, outputRight);
}
(Note that the #defines and const variables at the top of that example are just for demonstration/clarity, in reality they're calculated at runtime)
How can I efficnetly map bits into this format?
Since you are working on a fixed number of bits, you can try std::bitset<N>
which allocates N
bits on automatic storage; since the size is a constexpr
(compile-time constant) this class doesn't use dynamic allocation. std::vector<bool>
on the other hand, allocates packed bits dynamically with runtime size and resizable storage(I guess you don't need this).
The problem with std::bitset
is that minimum size (sizeof
) and alignment (alignof
) are underspecified and platform dependent. On a 32 bit machine you might write:
alignas(std::uint32_t)
std::array<std::bitset<32>,32> bitmap;
bitmap[5][7]=true;
static_assert(sizeof(bitmap)==sizeof(std::uint32_t[32]));//on 32bit platform
But due to underspecification, on a 64 bit platform this might waste another 1024 bits:
static_assert(sizeof(bitmap)==sizeof(std::uint64_t[32]));//on 64bit platform
Another approach can be to define your own class in terms of std::bitset
:
struct bitmap{
std::bitset<1024> value;
using index_pair=std::pair<std::uint32_t,std::uint32_t>;
auto operator [](index_pair ij)
{return value[(ij.first%32)*32+ij.second%32];};
auto operator [](index_pair ij) const
{return value[(ij.first%32)*32+ij.second%32];};
};
bitmap bm;
bm[{5u, 7u}]=true;
std::bitset
has a tone of accessor functions. Thus, the second approach may need to redefine many of them. With C++23 it is possible to define and use the indexer much simpler:
auto bitmap::operator [](this auto&& self, uint32_t i, uint32_t j);
bm[5u, 7u]=true;
Which is great syntax improvement for readability.