c++parsingsplittokentokenize

Parse (split) a string in C++ using string delimiter (standard C++)


I am parsing a string in C++ using the following:

using namespace std;

string parsed,input="text to be parsed";
stringstream input_stringstream(input);

if (getline(input_stringstream,parsed,' '))
{
     // do some processing.
}

Parsing with a single char delimiter is fine. But what if I want to use a string as delimiter.

Example: I want to split:

scott>=tiger

with >= as delimiter so that I can get scott and tiger.


Solution

  • You can use the std::string::find() function to find the position of your string delimiter, then use std::string::substr() to get a token.

    Example:

    std::string s = "scott>=tiger";
    std::string delimiter = ">=";
    std::string token = s.substr(0, s.find(delimiter)); // token is "scott"
    

    If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

    s.erase(0, s.find(delimiter) + delimiter.length());
    

    This way you can easily loop to get each token.

    Complete Example

    std::vector<std::string> split(const std::string& s, const std::string& delimiter) {
        std::vector<std::string> tokens;
        size_t pos = 0;
        std::string token;
        while ((pos = s.find(delimiter)) != std::string::npos) {
            token = s.substr(0, pos);
            tokens.push_back(token);
            s.erase(0, pos + delimiter.length());
        }
        tokens.push_back(s);
    
        return tokens;
    }
    
    std::string s = "scott>=tiger>=mushroom";
    std::string delimiter = ">=";
    
    split(s, delimiter); // ["scott", "tiger", "mushroom"]