[SOLVED] Which is the fastest way to erase in set?

Which is the fastest way to erase in set?

I read the cpp reference there are three ways to erase in std::set.

void erase (iterator position)
size_type erase (const value_type& val)
void erase (iterator first, iterator last)

Assuming I already know the iterator, which is faster, erasing using iterator or erasing using value?

Solution

For performance related questions, it is important to carefully define the operations, data, software stack and hardware that are of interest and then actually do the measurements. And, since developing code involves the weighing of alternatives, it is often useful to know roughly the magnitude of the performance difference. The results can often be surprising and informative.

For this question I will make the assumption that we want to understand the performance of the different std::set erase methods for the type uint64_t on a dataset of 1M elements. Since OP did not specify, I will use some common hardware / software that is available to me.

The following code measures the time to delete all 1M elements from a std::set using both iterators and values. It does the deletions in three different orders: sorted, reverse sorted and random. This results is six different measurements. The code and build instructions are available on GitHub.

#include "core/util/tool.h"
#include "core/chrono/stopwatch.h"
#include "core/util/random.h"

template<class Work>
void measure(std::ostream& os, std::string_view desc, Work&& work) {
    chron::StopWatch timer;
    timer.mark();
    work();
    auto millis = timer.elapsed_duration<std::chrono::milliseconds>().count();
    os << fmt::format("{:>16s}: {:5d} ms", desc, millis) << endl;
}

int tool_main(int argc, const char *argv[]) {
    ArgParse opts
        (
         argValue<'n'>("number", 100000, "Number of elements"),
         argFlag<'v'>("verbose", "Verbose diagnostics")
         );
    opts.parse(argc, argv);
    auto n = opts.get<'n'>();
    // auto verbose = opts.get<'v'>();

    using Set = std::set<uint64_t>;
    using Elements = std::vector<Set::value_type>;
    using Iterators = std::vector<Set::iterator>;

    Set src_data;
    Elements elements;
    for (auto i = 0; i < n; ++i) {
        src_data.insert(i);
        elements.push_back(i);
    }

Set data = src_data;
    Iterators iterators(n);
    for (auto i = 0; i < n; ++i)
        iterators[i] = data.find(i);
    measure(cout, "iterator-ordered", [&]() {
        for (auto iter : iterators) {
            data.erase(iter);
        }
    });

    data = src_data;
    for (auto i = 0; i < n; ++i)
        iterators[i] = data.find(i);
    std::reverse(iterators.begin(), iterators.end());
    measure(cout, "iterator-reverse", [&]() {
        for (auto iter : iterators)
            data.erase(iter);
    });

    data = src_data;
    for (auto i = 0; i < n; ++i)
        iterators[i] = data.find(i);
    std::shuffle(iterators.begin(), iterators.end(), core::rng());
    measure(cout, "iterator-random", [&]() {
        for (auto iter : iterators)
            data.erase(iter);
    });

    data = src_data;
    measure(cout, "value-ordered", [&]() { ;
            for (auto value : elements)
                data.erase(value);
    });

    data = src_data;
    std::reverse(elements.begin(), elements.end());
    measure(cout, "value-reverse", [&]() { ;
            for (auto value : elements)
                data.erase(value);
    });

    data = src_data;
    std::shuffle(elements.begin(), elements.end(), core::rng());
    measure(cout, "value-random", [&]() { ;
            for (auto value : elements)
                data.erase(value);
    });
    
    return 0;
}

A typical run produces something like the following:

$ make set_erase_performance && set_erase_performance -n 1000000
iterator-ordered:    23 ms
iterator-reverse:    31 ms
 iterator-random:   110 ms
   value-ordered:    65 ms
   value-reverse:    79 ms
    value-random:   376 ms

If we take the iterator-ordered performance for the M1 as 1, here are the relative times across some select hardware / software combinations (M1 Max / Mac OSX 13, Intel Xeon E5-2698 / Ubuntu 20.04, AMD 7601 / Ubuntu 22.06).

algo \ cpu	M1	Intel	AMD
iterator-ordered	1.0	1.3	2.3
iterator-reverse	1.4	7.3	4.2
iterator-random	4.8	6.3	21
value-ordered	2.8	5.7	7.0
value-reverse	3.4	7.1	8.4
value-random	16	28	71

As expected, using an iterator to erase the element is the fastest method and we have at least some data on how much faster because that is often the actual question of interest. The performance variation both across platforms and across orderings is interesting.