I was trying to get all words (sequence of non whitespace characters) out of a file. In trying to do so, I accidently created an infinite loop, because at the end of the file, no more word is extracted, but the stream is not exhausted yet. Note I realized, that just using std::views::istream<std::string>(file_stream)
would have solved my problems, but I am interested, in the why.
My code: compiler used: Clang 18.1.0 with flags: -std=c++23 -stdlib=libc++
#include <cctype>
#include <format>
#include <iostream>
#include <ranges>
#include <sstream>
#include <string>
#include <vector>
constexpr auto is_white_space = [](char ch) constexpr {
return std::isspace(static_cast<unsigned char>(ch));
};
struct word_extractor {
std::string word;
friend std::istream &operator>>(std::istream &s, word_extractor &we) {
std::string buff = std::ranges::subrange(std::istreambuf_iterator{s},
std::istreambuf_iterator<char>{})
| std::views::drop_while(is_white_space)
| std::views::take_while([](auto x) {
return !is_white_space(x);
})
| std::ranges::to<std::string>();
//if (s.peek() == EOF) s.get(); // uncommenting this code makes it work
we.word = buff;
return s;
}
};
int main() {
std::istringstream file_stream("lorem ipsum dolor sit amet ");
auto parsed_words = std::views::istream<word_extractor>(file_stream)
| std::views::transform([](const word_extractor &word_extractor)
{
return word_extractor.word;
})
| std::ranges::to<std::vector<std::string>>();
for (auto w : parsed_words) {
std::cout << std::format("{{{}}}\n", w);
}
}
output with if (s.peek() == EOF) s.get()
:
{Lorem}
{ipsum}
{dolor}
{sit}
{amet}
no output without if (s.peek() == EOF) s.get()
, due to infinite loop.
Without the commented line of manually consuming EOF
, the code gets stuck in an infinite loop, as std::views::istream<word_extractor>(file_stream)
tries to call operator>>
forever. Why is the stream not exhausted, as I first consume all white space characters and then all non white space ones?
Question: Is there a way to make this kind of extraction work with c++ ranges or is the (ugly) manual check for EOF
needed?
As pointed out by T.C., streambuf iterators will not change the state of the stream. To still use a range pipeline, another views::istream<char>
can be used instead of ranges::subrange()
:
struct word_extractor {
std::string word;
friend std::istream &operator>>(std::istream &s, word_extractor &we) {
s >> std::noskipws; // don't skip ws
std::string buff = std::views::istream<char>(s) | // read single char from stream
std::views::drop_while(is_white_space) |
std::views::take_while([](auto x) { return !is_white_space(x); }) |
std::ranges::to<std::string>();
we.word = buff;
return s;
}
};
Note however, the white-spaces are still needed for separation, thus s >> std::noskipws
is done to prevent white-spaces from being skipped, by the operator>>
applied to char
.