c++stringstring-view

How to detect use of std::string SSO (short string optimization)?


If I have a long string:

std::string data("This is a string long enough so that it is not a short string");

And then I have a view onto that string:

std::string_view dataView(std::begin(data) + 5, std::end(data) - 5);

If I move the original string:

std::string movedData(std::move(data));

Then I would expect the view dataView to remain valid.

But, this assumption does not hold if the std::string short string optimization takes effect, as the underlying memory of the string is not dynamically allocated, and the move now (underneath the hood) becomes a destructive copy operation, leaving the view invalid.

Is there a way to detect SSO (so I can take appropriate action in my class move constructor)? And does the standard make any reference to SSO?

Context

I have a class that holds a URL as a string, and then access to each of the parts of the URL is held as views into the original URL. Of course, calculating the views has a cost, but it's better to calculate it once than on each access (or that was my thought process). For a copy, you have recalculated the views, but a move (I thought) does not need to recalculate the views, as the underlying storage will be moved and thus the views will still be valid.

 class URL
 {
     std::string      url;
     std::string_view schema;
     std::string_view host;
     std::string_view path;
     // .. etc (for the multiple parts of a URL you can extract).
     // Note: Parsing a URL correctly is non-trivial (handling IPV6, etc.).
     //       So I don't want to do it that often.

     public:
         // Default constructor.
         URL() {}
         // Normal constructor: Accept input by copy/move
         URL(std::string urlInput)
             : url(std::move(urlInput))
         {
             // Compute Views.
         }
         // Copy constructor.
         URL(URL const& copy)
             : url(copy.url)
         {
             // Compute Views.
         }
         // Move constructor
         // I hoped I could simply swap the two objects.
         // This works if there is no short string optimization.
         URL(URL&& move) noexcept
         {
             swap(move);
         }
         // Assignment (both copy and move in one place).
         // Use standard copy and swap idium.
         URL operator=(URL assign) noexcept
         {
             swap(assign);
             return *this;
         }
         // Faithful swap function.
         void swap(URL& other) noexcept
         {
             using std::swap;
             swap(url, other.url);
             swap(schema, other.schema);
             swap(host,   other.host);
             swap(path,   other.path);
         }

         // Getter functions removed. But simply return std::string_view.
 };

Solution

  • Is there a way to detect SSO (so I can take appropriate action in my class move constructor)?

    Possibly, but if you need to detect an optimization, you're probably prematurely optimizing and/or writing fragile code. It would be better to rethink your approach. For the URL example, it would be simpler to store an offset and length for each piece instead of a string_view.

    The big advantage of storing offset and length is that this permits the compiler-generated copy and move operations to work correctly. There is no special logic required, not even re-computing the pieces in the copy constructor. The default logic just works.

    There is not much of a downside to storing offset and length. Storage size is the same if you use appropriately-sized integers, as a string_view also consists of two values, pointer and length. Runtime is almost the same; you would need to generate a string_view as needed, but this is just a step above trivial (construct the view from url.data() + offset and length).

    Skip the SSO detector and switch to "easy mode" (a.k.a. compiler-generated functions).

    does the standard make any reference to SSO?

    Not directly. However, it is notable that for containers, "passing a container as an argument to a library function" does not invalidate iterators unless otherwise specified ([container.reqmts]/67), but for strings, iterators may be invalidated when passing a string as an argument to a library function if the the argument is "a reference to non-const basic_string" ([string.require]/4). This difference in requirements serves little purpose other than permitting SSO, so one could say the standard was engineered to permit SSO without naming (or requiring) it.