c++optimizationintrusive-containers

What is the advantage of embedding a linked list into a data structure?


While reading about kernel data structures in FreeBSD, I stumbled on the MBuf. The MBuf contains a pointer to the next MBuf in a chain of MBuf's, implementing a linked list. Each MBuf itself also contains data specific to that node in the linked list.

I'm more familiar with designs that segregate the container type from the value type (consider std::list, or System.Collections.Generic.LinkedList). I'm struggling to understand the value proposition of embedding container semantics into the data type - what efficiencies are gained? Is it really all about eliminating the node instance pointer storage?


Solution

  • Consider you have an iterator/pointer to a node in your list. In order to fetch the data you have to:

    On the other hand, if the list concept is "embedded" within your data structure, you can read your object in a single memory operation as it is together with the node itself.

    Another issue with separated list node and its data, is that the list node itself is small (usually just 2 or 3 pointers). As a result, the memory overhead of keeping such a small structure in memory can matter. You know -- operations such as new or malloc actually consume more memory than they allocate -- the system uses their own tree structures to keep track of where memory is free and where it is not.

    In such scenarios, it is beneficial to group things up into a single allocation operation. You could try to keep several list nodes in small bundles, or you can try to connect each node with the data it allocates.

    Similar strategy can be seen with intrusive pointers (versus shared pointers), or std::make_shared that packs object and smart pointer data together.


    zett42 makes a comment that std::list<T> keeps T together with the node data. This achieves the single memory block as I explained above, but has a different problem: T cannot be polymorphic. If you have a class A and its derivative B, then node<B> is not a derivative of node<A>. If you try hard to insert B into std::list<A>, your object will:

    A typical solution if you want to hold polymorphic objects in a single list is to actually have std::list<A*> instead of std::list<A>. But then you end up with the extra indirection I explained above.

    An alternative is to make an intrusive list (e.g. boost::intrusive::list), where the node information is actually a part of A object. Then each node can be a derivative of A without a problem.