c++performance optimization memory-management memory-pool

Understanding Memory Pools

To my understanding, a memory pool is a block, or multiple blocks of memory allocate on the stack before runtime.
By contrast, to my understanding, dynamic memory is requested from the operating system and then allocated on the heap during run time.

// EDIT //

Memory pools are evidently not necessarily allocated on the stack ie. a memory pool can be used with dynamic memory.
Non dynamic memory is evidently also not necessarily allocated on the stack, as per the answer to this question.
The topics of 'dynamic vs. static memory' and 'memory pools' are thus not really related although the answer is still relevant.

From what I can tell, the purpose of a memory pool is to provide manual management of RAM, where the memory must be tracked and reused by the programmer.

This is theoretically advantageous for performance for a number of reasons:

Dynamic memory becomes fragmented over time
The CPU can parse static blocks of memory faster than dynamic blocks
When the programmer has control over memory, they can choose to free and rebuild data when it is best to do so, according the the specific program.

4. When multithreading, separate pools allow separate threads to operate independently without waiting for the shared heap (Davislor)

Is my understanding of memory pools correct? If so, why does it seem like memory pools are not used very often?

Solution

It seems this question is thwart with XY problem and premature optimisation.

You should focus on writing legible code, then using a profiler to perform optimisations if necessary.

Is my understanding of memory pools correct?

Not quite.

... on the stack ...

... on the heap ...

Storage duration is orthogonal to the concept of pools; pools can be allocated to have any of the four storage durations (they are: static, thread, automatic and dynamic storage duration).

The C++ standard doesn't require that any of these go into a stack or a heap; it might be useful to think of all of them as though they go into the same place... after all, they all (commonly) go onto silicon chips!

... allocate ... before runtime ...

What matters is that the allocation of multiple objects occurs before (or at least less often than) those objects are first used; this saves having to allocate each object separately. I assume this is what you meant by "before runtime". When choosing the size of the allocation, the closer you get to the total number of objects required at any given time the less waste from excessive allocation and the less waste from excessive resizing.

If your OS isn't prehistoric, however, the advantages of pools will quickly diminish. You'd probably see this if you used a profiler before and after conducting your optimisation!

Dynamic memory becomes fragmented over time

This may be true for a naive operating system such as Windows 1.0. However, in this day and age objects with allocated storage duration are commonly stored in virtual memory, which periodically gets written to, and read back from disk (this is called paging). As a consequence, fragmented memory can be defragmented and objects, functions and methods that are more commonly used might even end up being united into common pages.

That is, paging forms an implicit pool (and cache prediction) for you!

The CPU can parse static blocks of memory faster than dynamic blocks

While objects allocated with static storage duration commonly are located on the stack, that's not mandated by the C++ standard. It's entirely possible that a C++ implementation may exist where-by static blocks of memory are allocated on the heap, instead.

A cache hit on a dynamic object will be just as fast as a cache hit on a static object. It just so happens that the stack is commonly kept in cache; you should try programming without the stack some time, and you might find that the cache has more room for the heap!

BEFORE you optimise you should ALWAYS use a profiler to measure the most significant bottleneck! Then you should perform the optimisation, and then run the profiler again to make sure the optimisation was a success!

This is not a machine-independent process! You need to optimise per-implementation! An optimisation for one implementation is likely a pessimisation for another.

If so, why does it seem like memory pools are not used very often?

The virtual memory abstraction described above, in conjunction with eliminating guess-work using cache profilers virtually eliminates the usefulness of pools in all but the least-informed (i.e. use a profiler) scenarios.