I created a little parser and would like to make the looped part as fast as possible. Thus I would like to avoid copying using move semantics instead. But I can't understand how to combine move semantics and loops. My loop looks similar to this:
string word = "";
string s = "pip pip pop som"
unordered_map<string, int> dic;
for (char c : s) {
if ('a' <= c && c <= 'z') { // lower case
word.push_back(c);
}
else if ('A' <= c && c <= 'Z') { // upper case
word.push_back(c + 32);
}
else if (!word.empty()) { // word ended
++dic[word]; // I would like to avoid copying here
word.clear();
}
}
if (!word.empty()) {
++dic[word];
}
dic would contain:
"som" 1
"pip" 2
"pop" 1
I tried to std::move(word)
into the dic
, but it didn't work.
Can you, please advice on how to avoid copying here, and to use std::move
?
I sadly can't create the word
variable inside the loop.
When I made a minimal example for the question, it got even more confusing:
#include <iostream>
#include <utility>
#include <map>
#include <ranges>
int main()
{
using namespace std;
map<string, int> m;
string word;
for (auto i : views::iota(0, 10)) {
word.push_back('a');
++m[word]; // I would like to avoid copying here
word.clear();
}
for (auto [word, count] : m) {
cout << word << ' ' << count << endl;
}
return 0;
}
Depending on what you write in the first loop, the output will be different.
word.push_back('a');
++m[word];
word.clear();
outputs:
a 10
word.push_back('a');
++m[word];
outputs:
a 1
aa 1
aaa 1
aaaa 1
aaaaa 1
aaaaaa 1
aaaaaaa 1
aaaaaaaa 1
aaaaaaaaa 1
aaaaaaaaaa 1
word.push_back('a');
++m[move(word)];
outputs:
a 4
aa 3
aaa 2
aaaa 1
The last one is confusing me. Looks like it doesn't move the value, but it also doesn't copy it on every iteration.
I can imagine, why it doesn't move - because the variable word
is being used on the next iterations, and it's lifetime is longer.
But why it copies so chaotically ?
TL;DR: You cannot gain performance by moving instead of copying the string when you still need its value in the next iteration.
Nothing really weird going on.
First case:
word.push_back('a'); ++m[word]; word.clear();
Each iteration you add an 'a'
to the string, you increment the count for the string "a"
, you clear the string to start next iteration with an empty word
. You do that 10 times. Hence after the loop the count for "a"
is 10
.
Next case:
word.push_back('a'); ++m[word];
Each iteration you add one more 'a'
to word
and increment the count of the current word
. You do that 10 times. After the loop you have incremented the count of 10 different words, each one character longer than the previous one.
Last case:
word.push_back('a'); ++m[move(word)];
The move constructor of std::string
leaves the moved from string in a valid but unspecified state (read more here). The contents of word
could in principle be anything after you moved from it.
What you see is effect of short string optimization. Short strings are stored directly in the std::string
. No dynamic allocations are made. Hence there is nothing to be moved. Moving the short string is effectively copying it. Only after the string is so long it cannot use short string optimization the moved from string is empty after the move. None of this can be counted on. As mentioned above, the moved from string is in unspecified state. You are not supposed to move from it when you still need the value and when you need the moved from string empty you have to call clear
.
I understand, why it doesn't move - because the variable word is being used on the next iterations, and it's lifetime is longer.
Your understanding is wrong in this point. ++m[move(word)];
does move the string in every iteration. That you use the string in the next iteration is not relevant. By calling std::move(word)
you pretend that word
will not be used anymore. If you use it nevertheless you are on your own.
As mentioned in a comment, ++m[std::move(word)];
only does the move when word
is not found in the map. And same goes for the copy. ++m[word];
only copies word
when it is not in the map. When it is in the map then []
looks up the element and ++
increments it. For example in your first case only a single copy is made.