For those unfamiliar with polymorphic memory resources (PMR),
std::pmr::monotonic_buffer_resource:
"the class
std::pmr::monotonic_buffer_resource
is a special-purpose memory resource class that releases the allocated memory only when the resource is destroyed. It is intended for very fast memory allocations in situations where memory is used to build up a few objects and then is released all at once."
As such, if the only purpose of the destructor is to release memory, there is no point in calling it on a monotonic buffer resource:
auto mbr = std::make_unique<std::pmr::monotonic_buffer_resource>();
auto vectors = std::pmr::vector<std::pmr::vector<int>>(mbr.get());
vectors.resize(1'000'000, pmr_vector<int>(100)); // Create 1M vectors with 100 ints each
// Reset vectors
// This will cause 1M destructors to be unnecessarily called
vectors = {};
As all memory is drawn from mbr
, I would like to simply destroy the buffer. That however, will not prevent vectors
' destructor from being called and trying to deallocate memory that has already been freed.
A really, really bad way to do this is to call std::memset(&vectors, 0, sizeof(vectors));
. This improves the performance by 2.8x, but the comments to the (now deleted) answer strongly agree that this should not be done.
Because the standard libraries do not fully support PMR yet, I provided a complete example based on boost here: https://godbolt.org/z/nMbbez - it requires -lboost_container
t.niese had a great suggestion in the comments, which was to change the allocator of the containers. After all, our issue is not with the container, but that the allocator thinks that it needs to call do_deallocate
. We can simply copy the code of boost's polymorphic_allocator
and replace deallocate
with a no-op. You can find the entire code here. In the end, it is just copied from polymorphic_allocator.hpp with deallocate
replaced like this:
void deallocate(T* p, size_t n) noexcept {}
We can verify that this works by using a printing memory resource (see the debug_resource here or the one in the gist linked above). If we allocate 3 inner vectors of 100 entries each using the original polymorphic_allocator, we see
my_vector<my_vector<int>> vectors{mbr.get()};
vectors.resize(3, my_vector<int>(100));
// malloc 96
// malloc 400
// malloc 400
// malloc 400
// free 400
// free 400
// free 400
// free 96
Note that we did not need to pass mbr
into the inner vector. This is done automatically by PMR. With our new no_free_allocator
, the result is only
// malloc 96
// malloc 400
// malloc 400
// malloc 400
As expected, the memory is not freed anymore, but the objects are still properly deconstructed. Let us look at how this changes the performance of the deconstruction. The entire benchmark is also contained in the gist. Note that I changed the size of the inner vectors and preallocated memory for the monotonic_buffer_resource
. This is to make the effects more pronounced and to highlight the benefits of PMR and MBR in particular. Here are the results (g++10 -O3
):
polymorphic_allocator | no_free_allocator | |
---|---|---|
resize w/ default_resource | 47910 | 62574 |
resize w/ monotonic_buffer_resource | 30298 | 30947 |
reset w/ default_resource | 12298 | 3 (but leaks) |
reset w/ monotonic_buffer_resource | 5463 | 2520 |
First, this proves the initial motivation of using a monotonic_buffer_resource
. You can see that the resize is much faster if we do not call malloc
for each vector. I am still unsure why the no_free_allocator
is slower than the polymorphic_allocator
if no MBR is used, but that is a different topic.
Second, by removing the virtual method calls to do_deallocate
, we reduce the cost of resetting the outer vector and the MBR by just over 50%. The remaining cost of 2520 us is caused by having to free the MBR, not by resetting the vectors.
Compared to std::allocator
(which takes 58087 us to resize and 10741 us to reset), this takes is roughly twice as fast.