Summary:
It seems like just calling clear()
on a vector is not enough to clear it.
vector_full_of_stuff.clear();
I had to call clear()
and then shrink_to_fit()
to completely delete all data inside of it.
vector_full_of_stuff.clear();
// AND THEN
vector_full_of_stuff.shrink_to_fit();
What gives? This became a problem because when I would call data()
on a vector, it would include stuff that I thought should have been cleared when I called clear()
earlier in the code.
Additional Details:
I am doing an computer networking assignment where I have to parse a PASV command result into an IP and Port Number. While parsing a fictitious PASV command result separated by commas, I noticed that if I parse a three digit followed by a two digit I get that third digit from the previous parse when calling data()
even though I shouldn't (?) because I called clear()
before it.
ex.
PASV Command Result = 209,202,252,54,19,15
The "2" from "252" carries over into "19" when parsing.
Code:
// this one actually deletes data
void splitString(string str, char delimiter, vector<string> * out) {
vector<char> word_buffer;
for (int i = 0; i < str.length(); ++i) {
if (str[i] == delimiter) {
out->push_back(word_buffer.data());
word_buffer.clear();
word_buffer.shrink_to_fit();
} else {
word_buffer.push_back(str[i]);
}
}
out->push_back(word_buffer.data());
word_buffer.clear();
}
//
// this one doesn't
// the only thing that's different about this one
// is that its missing shrink_to_fit()
void splitString(string str, char delimiter, vector<string> * out) {
vector<char> word_buffer;
for (int i = 0; i < str.length(); ++i) {
if (str[i] == delimiter) {
out->push_back(word_buffer.data());
word_buffer.clear();
// word_buffer.shrink_to_fit(); // need this to delete data
} else {
word_buffer.push_back(str[i]);
}
}
out->push_back(word_buffer.data());
word_buffer.clear();
}
//
// main driver code
int main() {
vector<string> user_input_tokens;
string port = "209,202,252,54,19,15";
splitString(port, ',', &user_input_tokens);
for (string str : user_input_tokens) {
cout << str << ".";
}
}
//
Expected Output:
209.202.252.54.19.15.
Actual Output:
209.202.252.542.192.152.
The vector's data()
method returns a raw pointer to the vector's allocated array in memory. clear()
destroys the contents of that array if needed and sets the vector's size()
to 0, but does not reallocate the array itself, and thus does not change the vector's capacity()
. Calling the vector's shrink_to_fit()
method reallocates the array so its capacity()
matches its size()
, if possible (shrink_to_fit()
is advisory only and not guaranteed to actually do anything).
Also, when constructing a std::string
from a char*
pointer by itself, the char data needs to be null-terminated, but your data is not. You need to push a null terminator into the vector before using data()
:
void splitString(const string &str, char delimiter, vector<string> * out) {
vector<char> word_buffer;
for (int i = 0; i < str.length(); ++i) {
if (str[i] == delimiter) {
word_buffer.push_back('\0');
out->push_back(word_buffer.data());
word_buffer.clear();
} else {
word_buffer.push_back(str[i]);
}
}
if (!word_buffer.empty()) {
word_buffer.push_back('\0')
out->push_back(word_buffer.data());
}
}
Otherwise, you can simply take the vector's size()
into account when constructing the strings, no null terminators needed:
void splitString(const string &str, char delimiter, vector<string> * out) {
vector<char> word_buffer;
for (int i = 0; i < str.length(); ++i) {
if (str[i] == delimiter) {
out->push_back(string(word_buffer.data(), word_buffer.size()));
// alternatively:
// out->emplace_back(word_buffer.data(), word_buffer.size());
word_buffer.clear();
}
else {
word_buffer.push_back(str[i]);
}
}
if (!word_buffer.empty()) {
out->push_back(string(word_buffer.data(), word_buffer.size()));
// alternatively:
// out->emplace_back(word_buffer.data(), word_buffer.size());
}
}
That being said, there are other ways to implement a splitString()
function without needing the word_buffer
vector at all, eg:
void splitString(const string &str, char delimiter, vector<string> * out) {
string::size_type start = 0, pos = str.find(delimiter);
while (pos != string::npos) {
out->push_back(str.substr(start, pos-start));
start = pos + 1;
pos = str.find(delimiter, start);
}
if (start < str.size()) {
if (start > 0) {
out->push_back(str.substr(start));
} else {
out->push_back(str);
}
}
}
void splitString(const string &str, char delimiter, vector<string> * out) {
istringstream iss(str);
string word;
while (getline(iss, word, delimiter))
out->push_back(std::move(word));
}
But, even if you wanted to buffer the words manually, a std::string
would have made more sense than a std::vector<char>
, especially since you are outputting std::string
values:
void splitString(const string &str, char delimiter, vector<string> * out) {
string word_buffer;
for (string::size_type i = 0; i < str.length(); ++i) {
if (str[i] == delimiter) {
out->push_back(std::move(word_buffer));
word_buffer.clear();
} else {
word_buffer.push_back(str[i]);
}
}
if (!word_buffer.empty()) {
out->push_back(std::move(word_buffer));
}
}