c++pointersc-stringsdynamic-arraysstrtok

During the second call of strtok(), the code raises the following error: Invalid read of size 1


I am new to C++ and I do not understand why the code doesn't continute tokenizing the text array, and instead raises the error.

I suspect that the problme might be because of passing the token ptr mid-tokenizing, due to which the code just gives up, or something.

The task was as follows:

Write a function that will delete all words in the given text that meets more that one time.
Also note than WORD is sequence of letters separated by whitespace.
Note. The program has to use pointer.

Input:
First line contains one line that is not longer than 1000 symbols with whitespaces.
Each word is not longer than 30 symbols.

Output:
Formatted text.

#include <iostream>
#include <cstring>
#include <cctype>
using namespace std;

bool compareThem(char* lhs, char* rhs) {
    if (strcmp(lhs, rhs) != 0) {
        return false;
    }
    int len = strlen(lhs);

    for (int i = 0; i < len; i++) {
        if (!(lhs[i] == rhs[i])) {
            return false;
        }
    }
    return true;
}

bool isDuplicate(char* possibleDuplicate, char* result) {
    char* soFar = new char[strlen(result) + 1]; 
    strcpy(soFar, result); 

    char* word = strtok(soFar, " ");
    while (word != NULL) {
        if (compareThem(possibleDuplicate, word)) {
            delete[] soFar; 
            return true;
        }
        word = strtok(NULL, " ");
    }

    delete[] soFar; 
    return false;
}

void removeDuplicates(char* text) {
    char* result = new char[1000];
    result[0] = '\0';

    char* word = strtok(text, " ");
    while (word != NULL) {
        if (!isDuplicate(word, result)) {
            strcat(result, word);
            strcat(result, " ");
        }
        word = strtok(NULL, " ");
    }

    cout << result;

    delete[] result;
}

int main() {
    char* str = new char[1000];
    cin.getline(str, 1000);

    removeDuplicates(str);

    delete[] str;
    return 0;
}

Solution

  • strtok() is not reentrant. It can tokenize only 1 string at a time. When you call it with NULL, it returns the next token for the previous non-NULL string that you passed in.

    However, removeDuplicates() is trying to tokenize one string (text) and then calls isDuplicate() which tries to tokenize another string (soFar). This will not work. You need to use strtok_r() to maintain separate token states, eg:

    bool isDuplicate(char* possibleDuplicate, char* result) {
        ...
    
        char *saveptr = NULL;
        char* word = strtok_r(soFar, " ", &saveptr);
        while (word != NULL) {
            ...
            word = strtok_r(NULL, " ", &saveptr);
        }
    
        ...
    }
    
    void removeDuplicates(char* text) {
        ...
    
        char* saveptr = NULL;
        char* word = strtok_r(text, " ", &saveptr);
        while (word != NULL) {
            ...
            word = strtok_r(NULL, " ", &saveptr);
        }
    
        ...
    }
    

    That being said, your code is a mix of C and C++ logic. A better solution would be to fully embrace C++ idioms like std::string and standard algorithms, and not use any C functions at all.