c++stringstl

What actually is done when `string::c_str()` is invoked?


What actually is done when string::c_str() is invoked?

  1. string::c_str() will allocate memory, copy the internal data of the string object and append a null-terminated character to the newly allocated memory?

or

  1. Since string::c_str() must be O(1), so allocating memory and copying the string over is no longer allowed. In practice having the null-terminator there all the time is the only sane implementation.

Somebody in the comments of this answer of this question says that C++11 requires that std::string allocate an extra char for a trailing '\0'. So it seems the second option is possible.

And another person says that std::string operations - e.g. iteration, concatenation and element mutation - don't need the zero terminator. Unless you pass the string to a function expecting a zero terminated string, it can be omitted.

And more voice from an expert:

Why is it common for implementers to make .data() and .c_str() do the same thing?

Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.

So I am really confused now, what actually is done when string::c_str() is invoked?

Update:

If c_str() is implemented as simply returning the pointer it's already allocated and managed.

A. Since c_str() must be null-terminated, the internal buffer needs to be always be null-terminated, even if for an empty std::string, e.g: std::string demo_str;, there should be a \0 in the internal memory of demo_str. Am I right?

B.What would happen when std::string::substr() is invoked? Automactically append a \0 to sub-string?


Solution

  • Since C++11, std::string::c_str() and std::string::data() are both required to return a pointer to the string's internal buffer. And since c_str() (but not data()) must be null-terminated, that effectively requires the internal buffer to always be null-terminated, though the null terminator is not counted by size()/length(), or returned by std::string iterators, etc.

    Prior to C++11, the behavior of c_str() was technically implementation-specific, but most implementations I've ever seen worked this way, as it is the simplest and sanest way to implement it. C++11 just standardized the behavior that was already in wide use.

    UPDATE

    Since C++11, the buffer is always null-terminated, even for an empty string. However, that does not mean the buffer is required to be dynamically allocated when the string is empty. It could point to an SSO buffer, or even to a single static nul character. There is no guarantee that the pointer returned by c_str()/data() remains pointing at the same memory address as the content of the string changes.

    std::string::substr() returns a new std::string with its own null-terminated buffer. The string being copied from is unaffected.