c++performanceiteratorc++17forward-list

Construct new forward_list from iterators into existing forward_list via node splicing


If you have a std::list, and want to rebuild it into a new std::list in a different order, without copying, from iterators into the original std::list, it's fairly straightforward, for example, if you're shuffling the iterators and rebuilding from the shuffled order:

int main(int argc, char **argv)
{
    std::list<std::string> args(&argv[1], &argv[argc]);
    std::vector<std::list<std::string>::const_iterator> vec;
    for (auto it = std::cbegin(args), last = std::cend(args); it != last; ++it) {
        vec.push_back(it);
    }

    std::shuffle(std::begin(vec), std::end(vec), std::mt19937(42));  // Shuffle with reproducible PRNG

    std::cout << "Shuffled vec:\n";
    for (auto it : vec) {
        std::cout << *it << std::endl;
    }

    std::list<std::string> shuffled_args;
    for (auto it : vec) {
        shuffled_args.splice(std::end(shuffled_args), args, it);
    }

    std::cout << "Shuffled list:\n";
    for (const auto& s : shuffled_args) {
        std::cout << s << std::endl;
    }

    return 0;
}

This works great; on my system, when compiled with g++ -std=c++17 -O3 -flto -Wall shuffle_list.cpp and run with ./a.out a b c d e, the results from printing both the vector and the shuffled list agree (e a c d b in that order): Try it online!

But when I try to write a variant version using std::forward_list, things get dicier. This is the only version that doesn't segfault (with comments indicating changes):

int main(int argc, char **argv)
{
    std::forward_list<std::string> args(&argv[1], &argv[argc]);
    std::vector<std::forward_list<std::string>::const_iterator> vec;
    for (auto it = args.cbefore_begin(), last = std::cend(args); std::next(it) != last; ++it) { // Changed to store iterators before each element since splice_after needs them
        vec.push_back(it);
    }

    std::shuffle(std::begin(vec), std::end(vec), std::mt19937(42));

    std::cout << "Shuffled vec:\n";
    for (auto it : vec) {
        std::cout << *std::next(it) << std::endl;  // Must advance each iterator to see value stored
    }

    std::forward_list<std::string> shuffled_args;
    auto splice_loc = shuffled_args.cbefore_begin();      // Must begin inserting at beginning of new forward_list
    for (auto it : vec) {
        shuffled_args.splice_after(splice_loc, args, it); // splice_loc is before when node should go, it points to node before node to splice, great
        splice_loc = it;                                  // it should now be last element, it will be next splice location
    }

    std::cout << "Shuffled list:\n";
    for (const auto& s : shuffled_args) {
        std::cout << s << std::endl;
    }

    return 0;
}

The output here is the same from the vector, but for the resulting forward_list it just outputs e: Try it online! If you try other fun things like replacing splice_loc = it; with ++splice_loc; (which is logically equivalent; you spliced in a node after the current location, and advancing the iterator should move you to that new node), it segfaults.

I think I know why this is broken (it's two different ways for the full code and the replacement with ++splice_loc), and I don't think the approach I'm taking is salvageable. In the segfaulting code, as I splice nodes over, the iterators remain valid, but some of the iterators in the original structure are moved before I get to them (e.g. I use position 1's iterator to move the item at 2, and when I try to move 3 via 2, it moves some other random thing that follows 2 in the new list), and now I'm trying to splice in what follows them in the new forward_list (and violating the API, as I claim they came from args, when they've in fact been moved to shuffled_args at that point). There's no good fix for this in the design I've chosen AFAICT. Update: In the non-segfaulting code, I should be saving off std::next(it) before the splice and assigning that to splice_loc (by assigning it, which is probably still in the original forward_list, the splice_loc now points to the original list and I'm probably engaging in undefined behavior that ultimately modifies the original list more than the new one).

My question is:

Is there a good way to do what I want, taking iterators from a forward_list, jumbling them up (with shuffle or sorting or whatever), then building a new forward_list in an alternate order, by direct node transfer (no copies or moves of any of the contained elements), and doing so efficiently? The best I can come up with is making a new dedicated forward_list for each node in the vector, so I can splice that single node over to the final forward_list. This feels ugly, and possibly more expensive than the list equivalent behavior. Is there a better way? Is the many one-element forward_list solution actually great?


For the curious: This is a stripped down reproducer of a problem I'm having writing a keyed sorting algorithm (as in Schwartzian transform aka decorate-sort-undecorate), entirely as a personal challenge. I'm trying to make a specialized version for std::list and std::forward_list that avoids any copies or moves of the values being sorted by decorating the iterators rather than the values themselves, so once I've sorted the collection by the computed keys, I can then construct a new list or forward_list by using splice/splice_after to rebuild the container in the sorted order, without copying or moving a single one of the contained values.


Solution

  • This is a working version based on my original suggestion to use a bunch of single node std::forward_lists. It feels inefficient to make all this std::forward_lists, but from what I can tell, std::forward_list is spec-ed so narrowly that it's almost guaranteed to be implemented as a wrapper around a single pointer, which should be pretty low overhead. And it does work, with no copies or moves of the contained elements, nor (thanks to the use of deque) any copies or moves of the forward_lists themselves (aside from when we empty them to splice them onto the output forward_list), and it only traverses the input forward_list once (destroying it by extracting the first node over and over until it's emptied).

    It's not the prettiest, but it's not nearly as ugly or inefficient as I was expecting.

    int main(int argc, char **argv)
    {
        std::forward_list<std::string> args(&argv[1], &argv[argc]);
        std::deque<std::forward_list<std::string>> deq;  // Use deque so we never have to move existing forward_lists, and don't need to pre-walk to find size to reserve for vector
        while (!args.empty()) {
            auto& flist = deq.emplace_back();
            flist.splice_after(flist.cbefore_begin(), args, args.cbefore_begin());  // Extract a single node from input to populate new forward_list
        }
    
        std::shuffle(std::begin(deq), std::end(deq), std::mt19937(42));  // Shuffle with reproducible PRNG
    
        std::cout << "Shuffled deq:\n";
        for (auto& flist : deq) {
            std::cout << flist.front() << std::endl;
        }
    
        std::forward_list<std::string> shuffled_args;
        auto splice_loc = shuffled_args.cbefore_begin();
        for (auto&& flist : deq) {
            shuffled_args.splice_after(splice_loc, std::move(flist));  // splice single element forward_list contents onto end
            ++splice_loc;  // We added one element, move the iterator forward by one
        }
    
        std::cout << "Shuffled list:\n";
        for (const auto& s : shuffled_args) {
            std::cout << s << std::endl;
        }
    
        return 0;
    }
    

    Try it online!

    To be clear, if anyone has any better solutions, I'm happy to hear about them and would love to give the checkmark to something cleaner.