c++vectorbinaryfilesbloom-filterboost-dynamic-bitset

C++ Storing a dynamic_bitset into a file


Sort of a follow up to How does one store a vector<bool> or a bitset into a file, but bit-wise?

Basically I am writing a bitset as a binary file with the follow code:

boost::dynamic_bitset<boost::dynamic_bitset<>::block_type> filter;
vector<boost::dynamic_bitset<>::block_type> filterBlocks(filter.num_blocks());

//populate vector blocks
boost::to_block_range(filter, filterBlocks.begin());

ofstream myFile(filterFilePath.c_str(), ios::out | ios::binary);

//write out each block
for (vector<boost::dynamic_bitset<>::block_type>::iterator it =
        filterBlocks.begin(); it != filterBlocks.end(); ++it)
{
    //retrieves block and converts it to a char*
    myFile.write(reinterpret_cast<char*>(&*it),
            sizeof(boost::dynamic_bitset<>::block_type));
}
myFile.close();

I used the method of dynamic bitset and to_block_range into a temporary vector, then printing out the blocks into the file. It works but I am doubling my memory usage when I use an intermediate vector (vector used is the same size of my bitset). How can I print the bitset to a file without doubling my memory usage?

It would be nice if I could iterate through the bitset in blocks but it seems, to prevent some other problems, the authors of the dynamic bitset intentionally omitted this sort of functionality. Should I use a different datastructure? If it help for context I am using the bitset in a some bloom filter code.


Solution

  • You should do it manually. Iterate over the bits, pack them into unsigned chars, and stream.put the chars into the file.

    Directly writing the native block_type causes the file format to depend on platform-specific endianness, which is generally undesirable. (And setting block_type to char would harm performance.)

    Looking at your other question, I see that this is the same as what Nawaz suggested, and that you might want to go back to using std::vector<bool> instead.