c++c++20stdstringstd-span

Create std::string from std::span of unsigned char


I am using a C library which uses various fixed-sized unsigned char arrays with no null terminator as strings.

I've been converting them to std::string using the following function:

auto uchar_to_stdstring(const unsigned char* input_array, int width) -> std::string {
  std::string temp_string(reinterpret_cast<const char*>(input_array), width);
  temp_string.erase(temp_string.find_last_not_of(' ') + 1);

  return temp_string;
}

Which works fine, other than the use of reinterpret_cast, the need to pass the array size and the fact that I'm decaying an array into a pointer. I'm trying to avoid all of these issues with the use of std::span.

The function that uses std::span looks like this:

auto ucharspan_to_stdstring(const std::span<unsigned char>& input_array) -> std::string {
  std::stringstream temp_ss;

  for (const auto& input_arr_char : input_array) {
    temp_ss << input_arr_char;
  }

  return temp_ss.str();
}

The function works well, makes everything else simpler without having to track the C array's size. But, a little further digging with some benchmarking (using nanobench) shows that the new function is many times slower than the classic reinterpret_cast method. My assumption is that the for loop in the std::span-based function is the inefficiency here.

My question: Is there a more efficient method to convert a fixed-size C array of unsigned chars from a std::span variable to a std::string?


Edit:

gcc benchmark (-O3 -DNDEBUG -std=gnu++20, nanobench, minEpochIterations=54552558, warmup=100, doNotOptimizeAway)

relative ns/op op/s err% ins/op bra/op miss% total uchar[] to std::string
100.0% 5.39 185,410,438.12 0.3% 80.00 20.00 0.0% 3.56 uchar
2.1% 253.06 3,951,678.30 0.6% 4,445.00 768.00 0.0% 167.74 ucharspan
1,244.0% 0.43 2,306,562,499.69 0.2% 9.00 1.00 0.0% 0.29 ucharspan_barry
72.8% 7.41 134,914,127.56 1.3% 99.00 22.00 0.0% 4.89 uchar_bsv

clang benchmark (-O3 -DNDEBUG -std=gnu++20, nanobench, minEpochIterations=54552558, warmup=100, doNotOptimizeAway)

relative ns/op op/s err% ins/op bra/op miss% total uchar[] to std::string
100.0% 2.13 468,495,014.11 0.2% 14.00 1.00 0.0% 1.42 uchar
0.8% 251.74 3,972,418.54 0.2% 4,477.00 767.00 0.0% 166.30 ucharspan
144.4% 1.48 676,329,668.07 0.1% 7.00 0.00 95.8% 0.98 ucharspan_barry
34.5% 6.19 161,592,563.70 0.1% 80.00 24.00 0.0% 4.08 uchar_bsv

(uchar_bsv in the benchmarks is the same as ucharspan_barry, but with a std::basic_string_view<unsigned char const> parameter instead of std::span<unsigned char const>


Solution

  • You want:

    auto ucharspan_to_stdstring(std::span<unsigned char const> input_array) -> std::string {
        return std::string(input_array.begin(), input_array.end());
    }
    

    string, like other stand library containers, is constructible from an appropriate iterator pair - and this is such a pair. Since these are random access iterators, this will do a single allocation, etc.

    Note that I changed from span<T> const& to span<T const>, for two reasons. First, you're not mutating the contents of the span, so the inner type needs to be const... similar to how you took a T const*, not a T*. Second, you should take spans by value because they're cheap to copy (unless you very specifically need the identity of the span, which you don't here).

    It may be better to do a reinterpret_cast so that you can use the (char const*, size_t) constructor - this one ensures a single memcpy for the eventual write. But you'd have to time it to see if it's worthwhile.