c++templatesmemory-managementstdunordered-map

std unordered map reserve memory managment policy


Hi I am working with std::unordered map. I am dealing with a lot of inserts. The count of inserts is known ahead. I thought to optimize the memory by using the reserve methodand assuming it pre-allocates big chunk of memory for all future inserts. However I have noticed that it makes no difference for the times the map tries to allocate the memory.

See the example here . The number of allocations for both map which has reserved the number of inserts and the one who hasn't is the same?

Is there any way to reduce the number of allocations?

#include <string>
#include <unordered_map>
#include <iostream>
#include <vector>
    
using namespace std;
    
inline size_t AllocationCounter = 0;
    
template <typename T>
struct my_alloc {
    using value_type = T;
    
    my_alloc(){ };
    
    template <typename U>
    constexpr my_alloc(const my_alloc<U>& Other) noexcept  {  }
    
    T* allocate(std::size_t n) {
        AllocationCounter++;
        return  static_cast<T*>(::operator new (n * sizeof(T)));
    }
    
    template <typename... Args>
    void construct(T* p, Args&&... args) {
        new (p) T(std::forward<Args>(args)...);
    }
    
    void deallocate(T* p, std::size_t n) noexcept {
        AllocationCounter --;
        delete p;
    }
    
    void destroy(T* p) noexcept {
        p->~T();
    }
};
    
void with_reserve()
{
    unordered_map<size_t,size_t, std::hash<size_t>, std::equal_to<size_t>, my_alloc<std::pair<const size_t, size_t>>> m1;
    
    m1.reserve(1000); /// <------------------ RESERVATION ?
       
    for (size_t i = 0; i < 1000; i++)
    {
        m1.insert({i,i});
    }
    
    printf("With reserve - %llu Allocations\n", AllocationCounter);
}
    
void without_reserve()
{
    unordered_map<size_t,size_t, std::hash<size_t>, std::equal_to<size_t>, my_alloc<std::pair<const size_t, size_t>>> m1;
    
    for (size_t i = 0; i < 1000; i++)
    {
        m1.insert({i,i});
    }
    
    printf("Without reserve - %llu Allocations\n", AllocationCounter);
}
    
int main()
{
    with_reserve();
    
    without_reserve();     
}

Output:

With reserve - 1001 Allocations
Without reserve - 1001 Allocations

Expected: the number of allocation with reserve to be less that without.


Solution

  • Use a custom allocator, not just for profiling but to avoid heap allocations. If your std::unordered_map has exclusively insertions and lookups, but no removals, then you can use a trivial arena type allocator that is backed by a single buffer big enough to fit both the bucket vector as well as all nodes.

    Doing so will - for all STL implementations I'm aware of - greatly accelerate both inserts and the final teardown. The allocator used for both the nodes and the bucket list is derived from the element allocator passed to the map itself.

    I.e. the following allocator is suitable, and can even be simply allocated in the same scope as the map itself, including on the stack. In the normal case, this results in zero heap allocations.

    struct arena
    {
        arena() = delete;
        arena(arena&&) = delete;
        arena(const arena&) = delete;
        constexpr arena(void* buffer, size_t size) noexcept: base(buffer), top(buffer), bytes_remaining(size)
        {
        }
        void* const base;
        // Current top of buffer.
        void* top;
        size_t bytes_remaining;
    };
    
    template<size_t preallocated_size = 4096>
    struct inplace_arena : public arena
    {
        inplace_arena() noexcept: arena(_storage, sizeof(_storage))
        {
        }
    
    private:
        std::byte _storage[preallocated_size];
    };
    
    template<class T>
    struct arena_allocator
    {
        typedef T value_type;
    
        arena_allocator() = delete;
    
        arena_allocator(arena& storage) noexcept: _storage(storage)
        {
        }
    
        template<class U>
        constexpr arena_allocator(const arena_allocator<U>& other) noexcept: _storage(other._storage)
        {
        }
    
        [[nodiscard]] T* allocate(std::size_t n)
        {
            if (n > std::numeric_limits<std::size_t>::max() / sizeof(T))
                throw std::bad_array_new_length();
    
            const auto allocation_size = n * sizeof(T);
            if (auto from_arena =
                    static_cast<T*>(std::align(alignof(T), allocation_size, _storage.top, _storage.bytes_remaining)))
            {
                _storage.top = reinterpret_cast<std::byte*>(_storage.top) + allocation_size;
                _storage.bytes_remaining -= allocation_size;
                return from_arena;
            }
            else
            {
                if (auto from_heap = static_cast<T*>(std::malloc(allocation_size)))
                {
                    return from_heap;
                }
            }
    
            throw std::bad_alloc();
        }
    
        void deallocate(T* p, std::size_t /* n */) noexcept
        {
            if (p >= _storage.base && p < _storage.top)
            {
                // Pass, no point in recycling.
            }
            else
            {
                std::free(p);
            }
        }
    
        arena& _storage;
    };
    
    template<class T, class U>
    bool operator==(const arena_allocator<T>& a, const arena_allocator<U>& b)
    {
        return &a._storage == &b._storage;
    }
    
    template<class T, class U>
    bool operator!=(const arena_allocator<T>& a, const arena_allocator<U>& b)
    {
        return !(a == b);
    }
    

    Use it like

    using element_type = std::pair<const key, value>;
    // Enough storage for the nodes, the "past the end" unique sentinel, the links between nodes and the bucket list itself.
    // Somewhat implementation specific, use a debugger to verify if it's large enough.
    inplace_arena<(sizeof(element_type) + sizeof(void*) * 3) * (expected_element_count + 1)> elements_storage;
    using element_allocator = arena_allocator<element_type>;
    std::unordered_map<key, value, std::hash<key>, std::equal_to<value>, element_allocator> elements{element_allocator{elements_storage}};
    elements.reserve(expected_element_count);
    

    Even when using a custom allocator, you should still keep doing a .reserve() - growing the bucket list and the potentially resulting re-hash is still something to avoid at all cost.


    Expected: the number of allocation with reserve to be less that without.

    No, but only for the reason that you are counting wrong. The bucket list was reallocated repeatedly, and you had already decremented the counter whenever that happened. There was log(n) extra allocations that you didn't count.