c++stringc-str

std::string::c_str & Null termination


I've read various descriptions of std::string::c_str including questions raised on SO over the years/decades,

I like this description for its clarity:

Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object. This array includes the same sequence of characters that make up the value of the string object plus an additional terminating null-character ('\0') at the end.

However some things about the purpose of this function are still unclear.

You could be forgiven for thinking that calling c_str might add a \0 character to the end of the string which is stored in the internal char array of the host object (std::string):

s[s.size+1] = '\0'

But it seems std::string objects are Null terminated by default even before calling c_str: enter image description here

After looking through the definition:

const _Elem *c_str() const _NOEXCEPT
{   // return pointer to null-terminated nonmutable array
    return (this->_Myptr());
}

I don't see any code which would add \0 to the end of a char array. As far as I can tell c_str just returns a pointer to the char stored in the first element of the array pretty much like begin() does. I don't even see code which checks that the internal array is terminated by \0

Or am I missing something?


Solution

  • Before C++11, there was no requirement that a std::string (or the templated class std::basic_string - of which std::string is an instantiation) store a trailing '\0'. This was reflected in different specifications of the data() and c_str() member functions - data() returns a pointer to the underlying data (which was not required to be terminated with a '\0' and c_str() returned a copy with a terminating '\0'. However, equally, there was no requirement to NOT store a trailing '\0' internally (accessing characters past the end of the stored data was undefined behaviour) ..... and, for simplicity, some implementations chose to append a trailing '\0' anyway.

    With C++11, this changed. Essentially, the data() member function was specified as giving the same effect as c_str() (i.e. the returned pointer is to the first character of an array that has a trailing '\0'). That has a consequence of requiring the trailing '\0' on the array returned by data(), and therefore on the internal representation.

    So the behaviour you're seeing is consistent with C++11 - one of the invariants of the class is a trailing '\0' (i.e. constructors ensure that is the case, member functions which modify the string ensure it remains true, and all public member functions can rely on it being true).

    The behaviour you're seeing is not inconsistent with C++ standards before C++11. Strictly speaking, std::string before C++11 was not required to maintain a trailing '\0' but, equally, an implementer could choose to do so.