c++istreamstd-rangesc++23

why is std::views::istream not exhausted with take_while


I was trying to get all words (sequence of non whitespace characters) out of a file. In trying to do so, I accidently created an infinite loop, because at the end of the file, no more word is extracted, but the stream is not exhausted yet. Note I realized, that just using std::views::istream<std::string>(file_stream) would have solved my problems, but I am interested, in the why.

My code: compiler used: Clang 18.1.0 with flags: -std=c++23 -stdlib=libc++

#include <cctype>
#include <format>
#include <iostream>
#include <ranges>
#include <sstream>
#include <string>
#include <vector>

constexpr auto is_white_space = [](char ch) constexpr {
    return std::isspace(static_cast<unsigned char>(ch));
};

struct word_extractor {
    std::string word;

    friend std::istream &operator>>(std::istream &s, word_extractor &we) {
        std::string buff = std::ranges::subrange(std::istreambuf_iterator{s},
                                                 std::istreambuf_iterator<char>{}) 
                         | std::views::drop_while(is_white_space)
                         | std::views::take_while([](auto x) { 
                               return !is_white_space(x); 
                           })
                         | std::ranges::to<std::string>();

        //if (s.peek() == EOF) s.get(); // uncommenting this code makes it work
        we.word = buff;
        return s;
    }
};

int main() {
    std::istringstream file_stream("lorem ipsum dolor sit amet ");

    auto parsed_words = std::views::istream<word_extractor>(file_stream)
                      | std::views::transform([](const word_extractor &word_extractor)
                        {
                            return word_extractor.word;
                        })
                      | std::ranges::to<std::vector<std::string>>();

    for (auto w : parsed_words) {
        std::cout << std::format("{{{}}}\n", w);
    }
}

output with if (s.peek() == EOF) s.get():

{Lorem}
{ipsum}
{dolor}
{sit}
{amet}

no output without if (s.peek() == EOF) s.get(), due to infinite loop.

Without the commented line of manually consuming EOF, the code gets stuck in an infinite loop, as std::views::istream<word_extractor>(file_stream) tries to call operator>> forever. Why is the stream not exhausted, as I first consume all white space characters and then all non white space ones?

Question: Is there a way to make this kind of extraction work with c++ ranges or is the (ugly) manual check for EOF needed?


Solution

  • As pointed out by T.C., streambuf iterators will not change the state of the stream. To still use a range pipeline, another views::istream<char> can be used instead of ranges::subrange():

    struct word_extractor {
        std::string word;
    
        friend std::istream &operator>>(std::istream &s, word_extractor &we) {
            s >> std::noskipws;                                // don't skip ws
            std::string buff = std::views::istream<char>(s) |  // read single char from stream
                               std::views::drop_while(is_white_space) |
                               std::views::take_while([](auto x) { return !is_white_space(x); }) |
                               std::ranges::to<std::string>();
            we.word = buff;
            return s;
        }
    };
    

    Note however, the white-spaces are still needed for separation, thus s >> std::noskipws is done to prevent white-spaces from being skipped, by the operator>> applied to char.