c++memorymemory-managementmallocjemalloc

Large allocation metadata is causing fragmentation and wasting DRAM in jemalloc when using extent hooks


I am writing a simple custom extent allocator that returns extents from a pre mapped area (this way I can explicitly control the mapped area -- and the allocations in it). I am using the dev branch. Here is a simple test file where we perform 100 allocations of 1024 bytes (the allocation size doesn't matter) and set a custom alloc hook.

#include <iostream>
#include <sys/mman.h>
#include "/home/vin/jemalloc/include/jemalloc/jemalloc.h"
#include <cstring>
/*
  To run this test:
  1. clone jemalloc: git clone https://github.com/jemalloc/jemalloc.git
  2. configure jemalloc with --with-jemalloc-prefix=jem_
  3. build test:
    g++ -std=c++17 -I/home/user/jemalloc/include -L/home/user/jemalloc/lib test.cc -o test -ljemalloc
*/
const size_t INITIAL_MMAP_SIZE = 1ULL * 1024 * 1024 * 1024; // 1 GiB

struct ArenaInfo
{
    uintptr_t base_pointer;
    uintptr_t pre_alloc;
} info;

void *extent_alloc_hook_dram(extent_hooks_t *extent_hooks, void *new_addr, size_t size,
                             size_t alignment, bool *zero, bool *commit, unsigned arena_ind)
{
    uintptr_t ret = (info.pre_alloc + alignment - 1) & ~(alignment - 1);
    if (ret + size > info.base_pointer + INITIAL_MMAP_SIZE)
        return nullptr;
    info.pre_alloc = ret + size;
    if (*zero)
        memset(reinterpret_cast<void *>(ret), 0, size);
    printf(">>> extent request: size %lu ret %p alignment %lu\n", size, (void *)ret, alignment);
    return reinterpret_cast<void *>(ret);
}

int main()
{
    extent_hooks_t *hooks = new extent_hooks_s();
    memset(hooks, 0, sizeof(extent_hooks_s));
    hooks->alloc = extent_alloc_hook_dram;
    info.base_pointer = reinterpret_cast<uintptr_t>(mmap(NULL, INITIAL_MMAP_SIZE,
                                            PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0));
    if (reinterpret_cast<void *>(info.base_pointer) == MAP_FAILED)
    {
        std::cerr << "mmap failed" << std::endl;
        return 1;
    }
    info.pre_alloc = info.base_pointer;
    unsigned arena_id;
    size_t sz = sizeof(arena_id);
    if (jem_mallctl("arenas.create", &arena_id, &sz, (void *)&hooks, sizeof(extent_hooks_t *)))
    {
        throw std::runtime_error("Failed to create new arena with new hooks");
    }
    int flags = MALLOCX_ARENA(arena_id) | MALLOCX_TCACHE_NONE;
    for (int i = 0; i < 100; i++)
    {
        void *p = jem_mallocx(1024, flags);
        std::cout << "1024" << " " << p << std::endl;
    }
    jem_malloc_stats_print(NULL, NULL, NULL);

    return 0;
}

The test performs 100 allocations of 1024 bytes (the allocation size doesn't matter).

The sample output from the above code is

>>> extent request: size 2097152 ret 0x7fffb6e00000 alignment 2097152
>>> extent request: size 2097152 ret 0x7fffb7000000 alignment 4096
>>> extent request: size 4096 ret 0x7fffb7200000 alignment 4096
1024 0x7fffb7200000
1024 0x7fffb7200400
1024 0x7fffb7200800
1024 0x7fffb7200c00
>>> extent request: size 2097152 ret 0x7fffb7201000 alignment 4096
>>> extent request: size 4096 ret 0x7fffb7401000 alignment 4096
1024 0x7fffb7401000
1024 0x7fffb7401400
1024 0x7fffb7401800
1024 0x7fffb7401c00
>>> extent request: size 2097152 ret 0x7fffb7402000 alignment 4096
>>> extent request: size 4096 ret 0x7fffb7602000 alignment 4096
1024 0x7fffb7602000
1024 0x7fffb7602400
1024 0x7fffb7602800
1024 0x7fffb7602c00
>>> extent request: size 2097152 ret 0x7fffb7603000 alignment 4096
>>> extent request: size 4096 ret 0x7fffb7803000 alignment 4096
1024 0x7fffb7803000
1024 0x7fffb7803400
1024 0x7fffb7803800
1024 0x7fffb7803c00

jemalloc requests 2 MiB extents periodically. After each 2 MiB extent, it requests a 4 KiB extent.

I expected all the 1024 allocations (belonging to the same size class) must be in a contiguous slab. After every 4 1024 allocations, a 2 MiB extent is requested which is presumably used for metadata and consequently a 4096 B extent is requested to store our allocations. So you can now see how simple sequential allocations (of the same size class) is placing the allocations 2 MiB apart for no reason. I call the metadata usage "high and wasteful" because we don't need 2 MiB of metadata extent to manage a single 4 KiB extent. Ideally I would not expect a request for 2 MiB extents until a large number of pages are filled.


Solution

  • I found a fix. First, looks like either we should mark the extent as committed in alloc hook or provide an explicit commit hook that returns false. Furthermore, we also seem to require setting a split hook explicitly. What the split hook does doesn't seem to matter but just that it shouldn't be null. Having null hooks mean it's not allowed or failure.