I wanted to index substrings of a std::string
whose lifetime lasts for the entire program, and relate each substring to an int
using a good old std::map
. The substrings are delimited by blank spaces ' '
. I generated the string and stored it to a file using this python code
import random
all_chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ 123456789!\"·$%&/()=?¿¡'|@#¬"
for i in range(0, 50000000):
r = int(random.random() * len(all_chars))
print(all_chars[r], end = '')
So, I wanted to compare the efficiency to split a std::string
between using a std::string
and std::string_view
to store the substrings via these two (simple) functions.
[[nodiscard]]
std::vector<std::string_view> split_sv(const std::string& s, char c) noexcept {
std::vector<std::string_view> res;
std::size_t b = 0;
std::size_t p = s.find(c);
while (p != std::string_view::npos) {
res.push_back( std::string_view{&s[b], p - b} );
b = p + 1;
p = s.find(c, b);
}
return res;
}
[[nodiscard]]
std::vector<std::string> split_s(const std::string& s, char c) noexcept {
std::vector<std::string> res;
std::size_t b = 0;
std::size_t p = s.find(c);
while (p != std::string::npos) {
res.push_back( s.substr(b, i - b) );
b = p + 1;
p = s.find(c, b);
}
return res;
}
The split algorithm is clearly faster for std::string_view
than it is for std::string
(as expected):
std::string_view 28.6 ms
std::string 107.6 ms
(flags -std=c++20 -O3
, gcc 11.4.0
)
However, storing these substrings into a std::map
is slower for std::string_view
than for std::string
.
{
const std::vector<std::string_view> tokens = split(s, ' ');
std::map<std::string_view, int> m;
for (const std::string_view& t : tokens) {
m.insert( {t, 0} );
}
}
{
const std::vector<std::string> tokens = split(s, ' ');
std::map<std::string, int> m;
for (const std::string& t : tokens) {
m.insert( {t, 0} );
}
}
The execution time of only the for loops above is:
std::string_view 445.9 ms
std::string 411.7 ms
(flags -std=c++20 -O3
, gcc 11.4.0
)
The difference is not that large, but I expected the results for std::string_view
to be faster in both portions of the program.
Why is std::map<std::string, int>
faster in gcc 11.4.0
? Is std::map<std::string, int>
generally faster than std::map<std::string_view, int>
?
Full C++ code in compiler explorer and also in quick-bench.
I've run the code with 3 different string seeds each with a different number of blank spaces (NBS): one with just one single blank space, one with 10 blank spaces, and another with 20 blank spaces. The table below shows the execution time (in seconds) for every different seed string. I thought it would be interesting since the quick-bench results do not agree with this.
NBS | Number tokens/ Average split length (chars) |
Function | map string_view |
map string |
unordered_map string_view |
unordered_map string |
---|---|---|---|---|---|---|
1 | 617864 / 83.97 | split | 0.010 | 0.153 | 0.023 | 0.025 |
store | 0.441 | 0.408 | 0.106 | 0.167 | ||
10 | 4988259 / 9.35 | split | 0.594 | 0.883 | 0.469 | 0.206 |
store | 4.430 | 3.806 | 1.361 | 1.470 | ||
20 | 8062256 / 5.20 | split | 0.696 | 0.981 | 0.559 | 0.298 |
store | 6.568 | 5.590 | 1.820 | 1.892 |
This is totally related to cache locality.
When you store data as std::string
s, the only necessary data will be placed locally in heap for the very limited number of strings you actually use. (For smaller strings SSO (Small String Optimization) also comes to play and these strings would be placed even "more locally"). This is quite cache-friendly.
When you work with the tree in the 'std::map' most of these strings will be addressed frequently which will increase cache usage.
When you use std::string_view
every reference to the actual data has to be done in a huge array (your initial huge string) and will be sparse. Since your array is large and doesn't fit into caches, intensive random jumping across large memory array will lead to inefficient cache usage.
Some details on the cause
Cache is usually loaded in lines (let’s say 64 bytes) and with values stored with high locality the chance to get many useful values with one load is higher. On the opposite, with lesser locality you will need to load more pages which will eventually lead to Thrashing and cache performance degradation.
On other causes and comparisons
Regarding other performance comparisons and concerns in the original question. I would suggest opening new questions on these specific topics, as SO recommends, since they are related to very subtle and specific performance effects and require additional measurements and research which would degrade the quality of current topic. Nevertheless, here is some input as the author requested for further research.
The unordered_map
require on average less reads from memory; while std::map require on average O(log n) reads, the unordered_map requires O(1) reads (even if you have some elements in a bulk of items for the same hash code), so effect of sparse memory addressing decreases dramatically. On the other hand, it comes with the price for hash codes calculation (low) and extra memory for hash table and reads from it (not so high). So, you have at least three opposite influencers here, but overall in most cases now the effect of sparse memory addressing is quite low and it comes to get benefits from std::string_view
way to avoid excessive memory allocations and object construction of 'std::string'. On the other hand, for small strings SSO saves 'std::string' here. So, this is a balance of at least five influencers which comes to play in this case.
Implementations from different libraries and compilers could differ in many terms, including approach, source code implementation, advanced CPU instructions usage, cache utilization, etc., so they definitely worth putting in separate questions with lots of research. It is not enough to measure only on one type of CPU/RAM, etc. to say that something of such matter is faster than something else; it could be much slower on another architecture. And it would be nice to separate measurements with different numbers of tokens and average length, since they would affect different influencers and in order to have full picture, they must be separated; performance would depend differently from both. Again, these effects are too subtle and too specific to cover in one question.