cdata-structuresforkshared-memory

shared red black tree among processes


I want to create a red-black tree that it is shared among different forked processes.

Linking to an old question of mine old_question there cannot be resizable data structures in the shared memory (since shared regions are created for a given parent process and its children they are not generically shared, show every shared memory needs to be preallocated by the main process (the one that started them all)) so a typical red-black tree implementation will not work.

How would I have to deal with allocation for this particular task?
(I know how to implement a red-black tree generally but in this situation I am a little bit stuck.)

"the usual alternative is to use indexes into an array instead" was the answer to my old question. This was fine for what I did back then, I implemented a circular queue using an array located in the shared memory and replaced the whole list logic with that. I do not recall any implementation of a red black tree using arrays though.


Solution

  • "the usual alternative is to use indexes into an array instead" was the answer to my old question this was fine for what i did back then, I implemented a circular queue using an array located in the shared memory and replaced the whole list logic with that. I do not recall any implementation of a red black tree using arrays though.

    Your circular queue was an alternative that worked in that situation, but it was not an implementation of what I was actually describing. The key point here is that for any kind of data structure consisting of nodes that reference each other, you have the alternative of implementing those references with integers instead of with pointers. You can build either a linked list or a tree in this way, for example. This depends on all the nodes residing in an array, and manipulating it dynamically relies on you to implement some kind of management of which elements of that array are available for (re)use.

    Using indexes instead of pointers makes the inter-node references insensitive to the node addresses, such that the data structure can accommodate being located at different addresses in different processes sharing it. Similarly, such a data structure can relatively easily be relocated in memory. And having the nodes in an array can ensure that they reside in shared memory, which allocating nodes with malloc() definitely will not do.

    Here is a way the data could look:

    #define NUM_NODES 65536
    
    enum rb_node_color { RED_NODE, BLACK_NODE };
    
    struct rb_node {
        int key;
        enum rb_node_color color;
        long parent_index;
        long left_child_index;
        long right_child_index;
    };
    
    struct rb_node nodes[NUM_NODES];
    
    long tree_root_index;
    

    And here is a comparison between how you would perform certain example operations with pointer-based nodes and how you would perform them with index-based nodes:

    Operation Pointer version Index version
    whole-node value *node_ptr nodes[node_idx]
    left child access via node_ptr->left_child_ptr nodes[node_idx].left_child_index
    change color node_ptr->color = BLACK_NODE nodes[node_idx].color = BLACK_NODE
    null check node_ptr == NULL node_idx < 0

    I presume that you get the point. There is a limited number of nodes, but you can make that as many as you want (chosen in advance). Other than that, anything you can do with pointer-based nodes you can do with index-based ones.

    The remaining potential issue is how to manage the available node objects as nodes are added to and removed from the tree. In my previous answer, I described that as manual memory management, but that should not daunt you. For this purpose, it would be easy to initially arrange the nodes into an (index-based) linked list of available nodes, from which you can then draw nodes at need, and to which you can return nodes when they are no longer in use. You could repurpose one of the existing links for this, or add a for-purpose one.