c++undefined-behaviorstd-rangesc++23

std::views::chunk, std::views::transform, inputs ranges, and ill-formedness


Consider program 1:

#include <algorithm>
#include <iostream>
#include <ranges>
#include <sstream>

int main() {
    std::istringstream iss{"\x{01}\x{02}\x{03}\x{04}"};

    auto view = std::views::istream<char>(iss) | std::views::chunk(2) |
                std::views::transform([](auto chunk) {
                    return std::ranges::fold_left(chunk, 0, std::plus{});
                });

    static_assert(
        std::ranges::input_range<decltype(std::views::istream<char>(iss))>);
    static_assert(not std::ranges::forward_range<
                  decltype(std::views::istream<char>(iss))>);

    for (const auto i : view) {
        std::cout << i;
    }

    std::cout << '\n';
}

It could look a little contrived, but I would argue that such a pattern could be an alternative to consider for some real-world cases of range encoding. The point of views::istream in the code above is to get an input_range that is not a forward_range, something we'd want to support in general (files, networks, etc.).

The program above prints "37" on godbolt. Is that what it should do? It could look like that, but that program is in fact ill-formed!

And that is not only some language-lawyer talk, it is in fact very concrete. Consider program 2:

#include <algorithm>
#include <iostream>
#include <ranges>
#include <sstream>

int main() {
    std::istringstream iss{"\x{01}\x{02}\x{03}\x{04}"};

    auto view = std::views::istream<char>(iss) | std::views::chunk(2) |
                std::views::transform([](auto chunk) {
                    return std::ranges::fold_left(chunk, 0, std::plus{});
                });

    static_assert(
        std::ranges::input_range<decltype(std::views::istream<char>(iss))>);
    static_assert(not std::ranges::forward_range<
                  decltype(std::views::istream<char>(iss))>);

    auto it = view.begin();

    std::cout << *it << *it << '\n';
}

That program prints "30" on godbolt, i.e. the dereferenced value at the start of the view is both 3 and 0!

Such multiple dereferencing is exactly what std::views::cartesian_product will do.

Those programs are ill-formed because std::view::transform's argument is not std::regular_invocable. It is not std::regular_invocable because it changes its own argument, which std::regular_invocable functions are required not to do. The reason why it changes its own argument is that that argument, a chunk, is an input range that is not a forward range, i.e. it cannot be accessed without being consumed.

In practice, upon second invocation for the same top view position, the folding will get an empty range and hence yield zero.

As mentioned in that answer, there is P3138 aka views::cache_last, proposed for standardization, which, added to the pipeline, would avoid multiple evaluations, but as far as I understand, even with that, the programs above would still be formally ill-formed.

I'm assuming relying on ill-formedness is not an option so is there a plan to make the programs above, with views::cache_last somehow added to the pipeline, not ill-formed? Maybe std::views::transform could have a views::cache_last option, which, if true, would make it OK to consume the input, or something similar?


Solution

  • An other thing that you can do to avoid ill-formedness is to remove the transform.

    #include <algorithm>
    #include <iostream>
    #include <ranges>
    #include <sstream>
    
    auto sum_chunk(auto chunk) {
        return std::ranges::fold_left(chunk, 0, std::plus{});
    }
    
    int main() {
        std::istringstream iss{"\x{01}\x{02}\x{03}\x{04}"};
    
        auto view = std::views::istream<char>(iss) | std::views::chunk(2);
    
        for (const auto chunk : view) {
            std::cout << sum_chunk(chunk);
        }
    }
    

    Or you can implement your own version of views::transform, which only requires invocable rather than regular_invocable and only provides an input_range as it's result.

    template <typename F>
    struct input_transform : std::ranges::range_adaptor_closure<input_transform<F>>
    {
        F f;
     
        template <std::input_range R>
        requires std::invocable<const F&, std::ranges::range_reference_t<R>>
        auto operator()(R&& r) -> std::generator<std::invoke_result_v<const F&, std::ranges::range_reference_t<R>>> const {
            for (auto && v : std::forward<R>(r)) { 
                co_yield std::invoke(f, std::forward<std::ranges::range_reference_t<R>>(v));
            }
        }
    };
    

    Is there a plan to make the programs above, with views::cache_last somehow added to the pipeline, not ill-formed?

    No. The problem isn't that you are applying transform to an input range, but that the range value type is itself an input range. The following has the same problem, and view is a random_access_range. That's why views::transform requires std::regular_invocable.

    std::vector<std::istringstream> isss{
        {"\x{01}\x{02}"},
        {"\x{03}\x{04}"}};
    
    auto view = isss | 
                std::views::transform([](auto& iss) {
                    return std::views::istream<char>(iss);
                }) |
                std::views::transform([](auto chunk) {
                    return std::ranges::fold_left(chunk, 0, std::plus{});
                });