sortingdata-structureslinked-listinsertred-black-tree

Most efficient data structure for inserting and sorting


I need an efficient data structure for storing, I need to insert and maybe order. However, keeping the order after every insert is not necessary and I think that sortings are much less than inserts.

I was considering red-black trees but I'm not sure how fast it is inserting a node in an RB tree (compared to inserting it in a list for example); however, sorting in an RB tree is much more time-efficient.

What data structure is the most efficient for that?

Thanks for your time


Solution

  • I'm not sure how fast it is inserting a node in an RB tree (compared to inserting it in a list for example

    Insertion has this average time complexity:

    Traversing all values in sorted order:

    So for asymptotically increasing data sizes, insertion will eventually run faster on an RB than on a sorted list, although for small sizes the list can be faster (as it has less constant overhead). The actual tipping point will depend on implementation aspects, including the programming language and the structure of the values to compare. But insertion into a non sorted list will of course outperform both.

    Sorting a list on demand as is needed for an unsorted list, has a cost, but it is "only" O(nlogn) compared to O(n). So if sorting doesn't have to happen that frequently, it may be a viable option. Again, the tipping point -- as where the overall running time of several inserts and sorts is faster than the alternatives -- depends on implementation aspects.

    What data structure is the most efficient for that?

    In practice I have found B+ trees to be a good choice for fast insertion. Just like RB trees they have O(logn) insertion time, but one can tune the data structure with varying block sizes, trying to find out which one works best for your actual case. This is not possible with RB trees. Also B+ trees have the sorted list sitting in a linked list of sorted blocks, so iteration in sorted order is trivial. Nothing much is going to beat the speed of that.

    Another interesting alternative is a skip list. It resembles a B+ tree a bit, but its operations are easier to implement. It uses a factor more memory (same complexity) by the absence of blocks and more pointers.

    Which one will work the best depends on implementation/platform factors. In the end you'll want to implement some alternatives and compare them with benchmark tests.