I know that a trivial std::string_view
is not guaranteed to be null-terminated. However, I don't know if a std::string_view
literal is guaranteed to be null-terminated.
For example:
#include <string_view>
using namespace std::literals;
int main()
{
auto my_sv = "hello"sv;
}
Does C++17 or later guarantee that my_sv.data()
is null-terminated?
=== Below is updated ===
All of below are from n4820:
- As per 5.13.5.14, a string literal is null-terminated.
- As per 5.13.8, a user-defined-string-literal is composed of a string literal plus a custom suffix. Say,
"hello"sv
,hello
is the string literal,sv
is the suffix.- As per 5.13.8.5,
"hello"sv
is treated as a call of the formoperator "" sv(str, len);
as per 5.13.5.14,str
is null-terminated.- As per 21.4.2.1,
sv
'sdata()
must returnstr
.
Can they prove that "hello"sv.data()
is guarantteed to be null-terminated by the C++ standard?
So let's get the simple parts out of the way. No string_view
is ever "NUL-terminated", in the sense that the object represents a sized range of characters. Even if you create a string_view
from a NUL-terminated sequence of characters, the string_view
itself is still not "NUL-terminated".
The question you're really asking is this: does the implementation have some leeway to make the statement "some literal"sv
yield a string_view
whose data
member does not point into the NUL-terminated string literal represented by "some literal"
? That is, is this:
string_view s = "some literal"sv;
permitted to behave in any way differently from this:
const char *lit = "some literal";
string_view s(lit, <number of chars in of lit>);
In the latter case, s.data()
is guaranteed to be a pointer to the string literal, and thus you could treat that pointer as a pointer to a NUL-terminated string. You're asking if the former is just as valid.
Let's investigate. The definition for the operator""sv
overloads are stated to be:
constexpr string_view operator""sv(const char* str, size_t len) noexcept;
Returns:
string_view{str, len}
.
That is the standard specification for the behavior of this function: it returns a string_view
which points into the memory supplied by str
. Therefore, the implementation cannot allocate some hidden memory and use that or whatever; the returned string_view::data
is required to return the same pointer as str
.
Now, this brings us to a different question: is str
required to be a NUL-terminated string? That is, is it legal for a compiler to sees that you are using the sv
UDL implementation and therefore remove the NUL character from the array it was going to create for the string literal passed as str
?
Let's look at how UDLs for strings work:
If
L
is a user-defined-string-literal, letstr
be the literal without its ud-suffix and letlen
be the number of code units instr
(i.e., its length excluding the terminating null character). The literalL
is treated as a call of the formoperator "" X(str, len)
Note the phrases I emphasized. We know the behavior of "the literal without its ud-suffix". And the second phrase makes specific mention of the expected NUL terminator for str
. I'd say that's a pretty clear statement that str
will be given a literal string. And that literal string will be built in accord with regular string literal rules in C++, and therefore will be NUL-terminated.
Given the above, I think it is safe to say that there is no wiggle room for the implementation here. The string_view
returned by the UDL must point to the array defined by the string literal specified in the UDL, and like any other string literal, that array will be NUL-terminated.
That having been said, please review my first paragraph. You should not write any code which assumes that a string_view
is NUL-terminated. I would call it a code smell even if the creator of the string_view
and its consumer are right next to each other.