It's not clear to me how ranges::any_view<T, C>
works, especially in relation to what T
is, i.e. a &
, a const&
, a value type, or what else.¹
There isn't really much about it here, yet, nor I have found much around, beside this very useful explanation of why ranges::begin
can't be called on const ranges::any_view
.
Here's one such example which puzzles me a lot, and below are the relavants bit of it. I think that thoroughly understanding why the examples linked below work (or don't) the way they do, should help me understand what any_view
actually is.
This function accepts any range of std::string
s and print
s each of them (the filter(always(true))
is letting everything pass):
void test(any_view<std::string> parts) {
for_each(parts | filter(always(true)), print);
};
Now, if I pass an lvalue of type any_view<std::string const&>
any_view<std::string const&> view{foos};
into it,
test(view);
all is fine, but if I first pipe that into cache1
²,
test(view | cache1);
then the all the std::string
s result empty.
By trial and error I realized that changing
test(any_view<std::string> parts) {
to
test(any_view<std::string const&> parts) {
solves the problem, but I don't really understand why.
But if I then was to also change
any_view<std::string const&> view{foos};
to
any_view<std::string> view{foos};
Furthermore, manually inlining the body of test
at the call site, also seems to solve the issue.
And to make the thing trickier, when I try to change std::string
for a Foo
that holds a std::string
(and sets it to ""
in the destructor, to hopefully make possible UB a bit more "observable"), the issue seems to go away.
I know there's no any_view
in c++20, and I guess there won't be one in c++23 either, but is there a reason (beside performance) why it's not been included yet or won't be included in the foreseeable future? I mean, to me it looks like it does serve a ligitimate purpose, so what else in the standard library is or will fill that hole?
(¹) It's a bit clearer what the role of C
is, especially because I had an experience of downgrading an any_view<SomeType, sized>
to any_view<SomeType, forward>
for the purpose of allowing storing a filter_view
, which is not sized, into it.
(²) Yes, the cache1
is useless here, but in my real usecase I had view | transform(f) | filter(g)
, so I inserted cache1
in between to prevent f
from being called twice for each entry x
in view
which passes the test g(f(x))
.
This is a combination of a few things:
any_view<T>
, the reference
type of the range is T
. any_view<string>
is a range of prvalue string
, while any_view<string const&>
is a range of lvalues of string const
.any_view<T>
accepts any range who reference is convertible to T
, not same as T
. So you can construct an any_view<string>
from a range whose reference type isn't just string
or even char const*
but also, relevantly here, string&&
.cache1
is implemented in range-v3 is probably not the best way to do this, and looks like:
range_value_t<Rng>&& read() const {
if (parent_->dirty_) { /* update cache */ }
return std::move(*parent_->cache_);
}
filter
requires dereferencing iterators more than once (once to see if it passes the filter, and then the second time for when you actually use the value).So when you do this:
vector<string> foos = {...};
any_view<string> parts = foos | cache1;
foos | cache1
is a range of string&&
(which is an xvalue into cache1
's internal cache, foos
is unaffected - it could've been const
too). We're converting that to any_view<string>
, which is now a range of string
, which means that somewhere in all these types we have an iterator whose operator*()
returns a string
from the result of dereferencing (foos | cache1)
's iterator. That will move from cache1
's cache. So if you dereference twice, you will end up with an empty cache.
Then when you do parts | views::filter(pred)
, because you dereference every iterator twice (once to check if it satisfies the predicate, the second to print it), you will end up moving from the cached string
right before you print it, so you only get empty strings.
The problem here is probably that cache1
is a range of range_value_t<V>&&
, which can lend itself to this double-dereference issue. If it were a range of range_value_t<V>&
instead, then this would just work (at the cost of every dereference of your any_view<string>
leading to a string
copy). On the other hand, if you won't dereference the iterator more than once, you do want to just move, since if you were doing something like:
some | pipeline | cache1 | to<std::vector>();
You don't want to have to remember to write:
some | pipeline | cache1 | as_rvalue | to<std::vector>();
Note that a similar problem exists for views::concat(r1, r2)
if r1
is a range of string
and r2
is a range of string&&
, you will end up with a range whose reference type is string
, such that dereferencing that iterator will end up moving from the elements of r2
. Which is what you want to happen if you're doing views::concat(r1, r2) | to<vector>()
, but is definitely want you don't want to happen if you're doing views::concat(r1, r2) | filter(p) | ...
It's hard to know what the right answer is.