c++cartesian-productstd-rangesc++23

Am I misunderstanding how `std::views::cartesian_product` is supposed to work?


Edit: see also a connected question.


I have tested the following program on godbolt:

#include <algorithm>
#include <iostream>
#include <ranges>
#include <sstream>

int main() {
    std::istringstream iss{"\x{01}\x{02}\x{03}\x{04}"};

    auto view = std::views::istream<char>(iss) | std::views::chunk(2) |
                std::views::transform([](auto t) {
                    return std::ranges::fold_left(t, 0, std::plus{});
                });

    for (const auto& p :
         std::views::cartesian_product(view, std::views::iota(0, 3))) {
        const auto [s, i] = p;
        std::cout << s << ", " << i << '\n';
    }

    return 0;
}

The result with both GCC and clang is:

3, 0
0, 1
0, 2
7, 0
0, 1
0, 2

Since the first std::views::cartesian_product argument is supposed to be only passed through a single time, shouldn't it rather be the following?

3, 0
3, 1
3, 2
7, 0
7, 1
7, 2

I saw a relevant question in a comment that then was removed: dereferencing the iterator returned by view.begin() three times gives 3, 0, 0, which should be the root cause.


Solution

  • The problem here is the transform actually.

    There are two facts that we're running afoul of:

    1. Input ranges can only be traversed one single time. Doing so consumes them.
    2. transform(f) must be equality-preserving. That is, the function, f, that you are providing must return the same answer given the same input.

    When you're doing fold_left in the transform, the argument there (t) is an input range (the originating istream<char> is an input range, and then chunk-ing it gives you an input range of input ranges). fold_left has to traverse that range, which consumes it. But you can only do that one single time. The second time you do it, there's nothing left to traverse, so you have en empty range.

    Put differently, you're violating the expectation of iterators, which is:

    auto it = /* some iterator */;
    auto v1 = *it;
    auto v2 = *it;
    assert(v1 == v2);
    

    By providing a callable which consumes its input, it's preventing this iterator dereference from producing the same value every time. And that's the behavior you see, you get the expected answer the first time (3 and then 7) but then garbage every other time (0 because that's what your fold gives on an empty range).


    There are two ways to fix this.

    One is to make the range non-input, for instance by first consuming the whole views::istream<char> into a vector<char> and then doing the chunk | transform(fold) on the resulting vector. Now our chunks aren't input-only, so the fold can give the same answer every time.

    The other is to "fix" the result of the transform in the literal sense by avoiding recalculating it. range-v3 has an adaptor for this (cache1) and there is one proposed for standardization called cache_last P3138:

    auto view = views::istream<char>(iss)
              | views::chunk(2)
              | views::transform([](auto t) {
                  return ranges::fold_left(t, 0, plus{});
                })
              | views::cache_last // <== add me
              ;   
    

    cache_last simply caches the last value, ensuring that *it repeatedly always produces the same thing since we never go back to the underlying iterator. Since we're only calling fold exactly one time, we don't run into the input issue.

    I'm not sure if there's really a good way of diagnosing such mis-uses.