c++language-lawyerc++20std-rangessize-t

Can std::ranges::enumerate enumerate any possible valid C++ array?


Once I asked whether std::ranges::views::enumerate uses the wrong type (long) for indexing on GCC, but apparently that's not the case, because

std::views::enumerate is specified to use range_difference_t<Base> as its index value.

However, from the draft

The type size_t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object ([expr.sizeof]).

So one can think of an array so long that the trailing elements from some index onward don't fit the type used by enumerate for indexing, while still fitting std::size_t by definition, provided on a given machine, the following holds

static_assert(std::numeric_limits<std::size_t>::max() > std::numeric_limits<long>::max());

For instance, such an array would be this:

std::vector<int> v(std::numeric_limits<std::size_t>::max());

for which I'd wonder how k in the following evolves:

auto w = v | std::ranges::views::enumerate;
for (auto [k, _] : w) {
    std::cout << k << std::endl;
}

Unfortunately, std::vector doesn't seem to allow that size at all, throwing an exceptions, wheras a std::array with that same size isn't even compilable (full example here), but I assume that neither behavior is mandated by the standard. Or is it?


Solution

  • No, it is possible that enumerate can't enumerate all elements of an array.

    But that isn't anything unusual. It is a general problem that has always existed in C and C++.

    For example if std::size_t and std::ptrdiff_t have the same bit width and the implementation actually permits objects to have the size std::numeric_limits<std::size_t>::max(), then

    char arr[std::numeric_limits<std::size_t>::max()];
    

    will be valid, but std::numeric_limits<std::ptrdiff_t>::max() will be smaller than the maximum index of the array. range_difference_t for the array is also std::ptrdiff_t and so enumerate will eventually overflow causing undefined behavior.

    However, the exact same problem occurs with any form of pointer differences into the array, not just enumerate. For example &arr[std::numeric_limits<std::size_t>::max()-1] - &arr[0] also has type std::ptrdiff_t and then has undefined behavior because the value is not representable.

    So, practically speaking, the implementation can't allow arrays that are larger than the maximum value of std::ptrdiff_t (instead of std::size_t) in order to avoid completely breaking the standard library and a lot of user code.

    The same applies to the use of size_type and difference_type in ranges/containers. In practice the maximum permitted size must be limited to the maximum of difference_type (rather than size_type) to avoid these UB edge cases from being possible. There is no guarantee that the size of a container can reach the maximum value of size_type. Containers usually have a max_size member function to tell the actual maximum size.

    All of this is why a common opinion is that size_t/size_type was a mistake and that everything should have been defined with ptrdiff_t/difference_type as basis instead. I guess that opinion is also why the choice to use difference_type in enumerate (and other views) was made.