algorithmdata-structuresb-treesegment-treermq

Range minimum query, dynamic array, interval tree, treap


I need an algorithm with some data structure in Python that at every step when two new elements e1, e2 are given:

And this step must be done in no more than logarithmic time, because when this step is repeated N times the overall worst-case time complexity cannot be far from O(NlogN).

-- For example: my_list = [(2,1), (4,3), (5,7), (9,1)]

As we see, the element 2 is paired with its assigned value 1, the element 4 is paired with value 3, 5 is paired with value 7, and 9 with value 1. And my_list is ordered by the first elements of the pairs.

Now, two elements are given, e1 = 3, e2 = 6.

The insertion position in my_list of (e1, ) == (3, ) is index 1, and the insertion position of (6, ) is index 3.

The maximum value found in the elements of my_list between indices 1 and 3 is value 7, because the maximum value of (4,3), (5,7) is 7.

Imagining that the constant to add is 1 we have: maximum value found + constant == 7 + 1 == 8. And we had e2 == 6, so the pair to insert is (6, 8) at the index 3.

And at the end of this step my_list must be: [(2,1), (4,3), (5,7), (6,8), (9,1)]

-- This question linked here is very similar but differs with my question on the index where the element is inserted. In that question the element is added at the end (appended), in my case, the insertion must be done in a way that preserves the order of the elements such that the start and end of the next arbitrary interval can be found in logarithmic time. This is why I think that, besides using a range minimum query, I will also need to use some advanced data structure like an interval tree or a treap.


Solution

  • For this type of job, I usually use an augmented B+ tree. See here for what a B+ tree is: https://en.wikipedia.org/wiki/B%2B_tree

    Starting with a B+ tree, I would augment all of the internal nodes so that along with each pointer to a child node, it stores the maximum value associated with all keys in the subtree rooted at that child node.

    That extra information makes it easy to calculate the maximum value for any interval in O(log N), and it's easy to maintain as you insert, delete, and modify items in the tree without changing the O(log N) complexity of those operations.

    For an in-memory B+ tree, a maximum of 8 or so keys per node is usually performant.