c++performancebit-manipulationmemory-efficient

Efficient bit-mapping: one-time computation?


I have an array of up to 32 rows (often less) of uint32_t values that are essentially just bitmaps - so a total of 1024 bits maximum. The data in these <=32 uint32_ts changes frequently (multiple times/millisecond).

For a vectorized algorithm I'm running, I need to extract some of the data from these bitmaps and rearrange the bits into the format my algorithm expects. The positions of the data I need to extract is known at runtime and will not change during the run. I want to find a way - even if it's a bit computationally heavy, as long as it's not memory-heavy I can frontload it - to compute some mapping function that will enable me to quickly extract the data into the format my algorithm expects, so I can run it as frequently as possible to keep up with the data.

The mapping looks something like this, presented in uint8_t binary format below for simplicity.

Input data (nominally up to 32 uint32_ts, presented here as 3 uint8_ts):

abcd efgh
ijkl mnop
qrst uvwx

I want to extract some of these, and create something like the following output data:

00000adu
00000fgx

Here, I'm only extracting 6 of the 24 bits. As a note, the data I extract will always:

  1. Be an even number of bits - I need pairs, for instance "af" and "dg" and "ux" above.
  2. Each "pair" will always come from either the same row (uint32_t) or column (bit position within all 32 of the uint32_ts). I will never have some data come from the same row and other data come from the same column; it's one or the other for a given configuration.
  3. The algorithm expects to operate on a pair of output uint32_ts. In the case where I extract more than 64 bits - and so the result won't fit in 2 uint32_ts - I'll have to expand into a pair of arrays of uint32_ts.

Obviously there's the naive approach, but it doesn't operate very quickly - I'm hoping to run this as many times a second as possible.

#define EXTRACT_PAIRS_COUNT 3
const uint8_t colExtractLeft[EXTRACT_PAIRS_COUNT] = {7, 4, 3}
const uint8_t colExtractRight[EXTRACT_PAIRS_COUNT] = {2, 1, 0}
const uint8_t rowExtract[EXTRACT_PAIRS_COUNT] = {0, 0, 2}

uint8_t input[3] = {abcdefgh, ijklmnop, qrstuvwx};

while (inputChanges) {
    uint8_t outputLeft = 0;
    uint8_t outputRight = 0;
    for (size_t i=0; i<EXTRACT_PAIRS_COUNT; i++) {
        outputLeft |= ((input[rowExtract[i]] >> colExtractLeft[i]) & 0x1) << i;
        outputRight |= ((input[rowExtract[i]] >> colExtractRight[i]) & 0x1) << i;
    }
    runAlgorithm(outputLeft, outputRight);
}

(Note that the #defines and const variables at the top of that example are just for demonstration/clarity, in reality they're calculated at runtime)

How can I efficnetly map bits into this format?


Solution

  • Since you are working on a fixed number of bits, you can try std::bitset<N> which allocates N bits on automatic storage; since the size is a constexpr(compile-time constant) this class doesn't use dynamic allocation. std::vector<bool> on the other hand, allocates packed bits dynamically with runtime size and resizable storage(I guess you don't need this). The problem with std::bitset is that minimum size (sizeof) and alignment (alignof) are underspecified and platform dependent. On a 32 bit machine you might write:

    alignas(std::uint32_t)
    std::array<std::bitset<32>,32> bitmap;
    bitmap[5][7]=true;
    static_assert(sizeof(bitmap)==sizeof(std::uint32_t[32]));//on 32bit platform
    

    But due to underspecification, on a 64 bit platform this might waste another 1024 bits:

    static_assert(sizeof(bitmap)==sizeof(std::uint64_t[32]));//on 64bit platform
    

    Another approach can be to define your own class in terms of std::bitset:

    struct bitmap{
         std::bitset<1024> value;
    
         using index_pair=std::pair<std::uint32_t,std::uint32_t>;
    
         auto operator [](index_pair ij)
         {return value[(ij.first%32)*32+ij.second%32];};
    
         auto operator [](index_pair ij) const
         {return value[(ij.first%32)*32+ij.second%32];};
    };
    
    bitmap bm;
    bm[{5u, 7u}]=true;
    
    

    std::bitset has a tone of accessor functions. Thus, the second approach may need to redefine many of them. With C++23 it is possible to define and use the indexer much simpler:

        auto bitmap::operator [](this auto&& self, uint32_t i, uint32_t j);
       
    bm[5u, 7u]=true;
    

    Which is great syntax improvement for readability.