I recently came across the bitset templates and would really like to use them in my current project. Reading on, I see that the std::bitset
template must have a size determined at compile time. Many suggest using the boost::dynamic_bitset
to alleviate this requirement.
To compare the two, I decided to do a speed comparison of set
, flip
, and count
methods.
The results are quite odd...and I'm wondering if someone can shed some light on it for me.
The code is at the end of the post, but I'll explain what I'm doing here. I have one std::bitset
object (call it bs
) and one boost::dynamic_bitset
object (call it dynbs
). Each has n=1000000
bits. For a given method above, call the method on each of the n
bits sequentially and repeat this R=10000
times.
Using the std::chrono
library, here are the timings for each in nanoseconds:
set
bitset: 267 nsecs
dyn bitset: 18603174546 nsecs
flip
bitset: 73 nsecs
dyn bitset: 18842352867 nsecs
count
bitset: 77 nsecs
dyn bitset: 51 nsecs
The boost::dynamic_bitset
seems to be much slower for set
and flip
.
To make it more interesting, if the method reset
is called on the two objects before running these tests, then the timings are comparable. Here they are:
set
bitset: 19397779399 nsecs
dyn bitset: 18472863864 nsecs
flip
bitset: 18599248629 nsecs
dyn bitset: 18376267939 nsecs
count
bitset: 68 nsecs
dyn bitset: 61 nsecs
Now, both containers claim to initialize all bits to 0
, thus a call to reset
shouldn't change any of the bits. Dumping the output of none
before and after reset
does confirm this.
So after all that, I have two questions:
1) Why is the boost::dynamic_bitset
so much slower than the std::bitset
when calling set
and flip
?
2) Why does calling reset
have a huge negative effect on the speed of std::bitset
?
Here is my code:
#include <iostream>
#include <iomanip>
#include <bitset>
#include <boost/dynamic_bitset.hpp>
#include <vector>
#include <chrono>
#include <ctime>
using namespace std;
using namespace chrono;
using namespace boost;
int main(){
const unsigned int n=1000000;
bitset< n > bs;
dynamic_bitset< > dynbs(n);
// bs.reset();
// dynbs.reset();
unsigned int i,r,R=10000;
high_resolution_clock::time_point tick,tock;
////////////////////////////////////////////////////////////
// Method: set
std::cout << "set" << std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
bs.set(i);
tock=high_resolution_clock::now();
cout << setw(16) << "bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
dynbs.set(i);
tock=high_resolution_clock::now();
cout << setw(16) << "dyn bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl << std::endl;
////////////////////////////////////////////////////////////
// Method: flip
std::cout << "flip" << std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
bs.flip(i);
tock=high_resolution_clock::now();
cout << setw(16) << "bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
dynbs.flip(i);
tock=high_resolution_clock::now();
cout << setw(16) << "dyn bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl << std::endl;
////////////////////////////////////////////////////////////
// Method: count
std::cout << "count" << std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
bs.count();
tock=high_resolution_clock::now();
cout << setw(16) << "bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
tick=high_resolution_clock::now();
for(r=0; r<R; r++)
for(i=0; i<n; i++)
dynbs.count();
tock=high_resolution_clock::now();
cout << setw(16) << "dyn bitset: "
<< setw(16) << duration_cast<nanoseconds>(tock-tick).count() << " nsecs"
<< std::endl;
return 0;
}
I compiled it with
g++ -O3 -std=c++11 bitset.cpp -o bitset
where bitset.cpp
is the code inserted above.
1) Why is the
boost::dynamic_bitset
so much slower than thestd::bitset
when calling set and flip?
Since std::bitset
does not use dynamic allocation, and your bitset
is a local variable, the compiler can easily determine that all the set
's and flip
s and count
s have no externally visible effect. As a result, it optimizes them all away, and your code basically ends up being a bunch of timing and printing calls.
Note that in the above assembly it doesn't even allocate stack space for the bitset. The whole thing basically vanished without a trace.
boost::dynamic_bitset
allocates its buffer dynamically with new
, which ends up calling ::operator new()
, which can be an arbitrary overloaded version defined in a different translation unit. So it's very difficult for the compiler to prove that changes to the buffer returned are not visible externally.
Your count
loops for both std::bitset
and boost::dynamic_bitset
are optimized away because the compiler can easily see that count()
doesn't change anything in the bitsets and you don't use the return value.
2) Why does calling
reset
have a huge negative effect on the speed of std::bitset?
I checked the source code of reset
in GCC, and it calls a compiler intrinsic __builtin_memset
, passing it a pointer to the buffer. When you pass a pointer to a stack variable to an external function, then the compiler is much more constrained in what it can remove, since changes in the variable may now be externally observable (for example, the function called could have stored a copy of the pointer somewhere and something can peek into it later).