c++algorithmsearchtree

Find a specific word algorithm in c++


i try to build an algorithm to find a forbidden word which has 4 characters. My assignment is to call the function sendMessage() as rarely as possible.

here are some details about the sendMessage() Function: bool sendMessage(std::string message) The message can be a maximum of 10,000 characters long and consist only of lowercase letters from 'a' to 'z'. The function returns true if the message has been censored; otherwise, it returns false.

The way i wanted to solve this problem is that i just create random words with 4 characters and put them all together in a string with 10000 characters (2500 words * 4 characters each). After that i look if the sendMessage() Function returns true and if that is the case, i apply something like a Tree Search where i divide the message into 2 parts, see if it returns true on one of these, if yes devide again into 2 parts and so on.

My problem now is that my program doesn't find the forbidden word. Here is my Code:

#include "zensur.h"

#include <iostream>
#include <string>
#include <utility> // for std::pair

std::pair<std::string, std::string> splitString(const std::string& inputString) {
    // Determine the length of the string
    size_t length = inputString.length();

    // Check if the length is even or odd
    if (length % 2 == 0) {
        // If the length is even, split the string into two halves
        size_t middle = length / 2;

        std::string part1 = inputString.substr(0, middle);
        std::string part2 = inputString.substr(middle);

        return std::make_pair(part1, part2);
    }
    else {
        // If the length is odd, split the string so that the first part has an additional character
        size_t middle = length / 2;

        std::string part1 = inputString.substr(0, middle + 1);
        std::string part2 = inputString.substr(middle + 1);

        return std::make_pair(part1, part2);
    }
}

std::string findBannedWord() {
    const int wordLength = 4;
    std::string concatenatedWords;

    for (char a = 'a'; a <= 'z'; ++a) {
        for (char b = 'a'; b <= 'z'; ++b) {
            for (char c = 'a'; c <= 'z'; ++c) {
                for (char d = 'a'; d <= 'z'; ++d) {

                    // make a potential word
                    std::string potentialWord = { a, b, c, d }; 

                    
                    if (concatenatedWords.length() < (10000 - wordLength)) {    
                        // put 2500 potential words together to a concatenated Word
                        concatenatedWords += potentialWord;
                    }

                    // if the concatenated Word is 10000 long (2500*4) then look if the forbidden word is in it
                    if (concatenatedWords.length() == 10000) {  
                        if (sendMessage(concatenatedWords)) {   

                        // if it is in it then apply something like a tree structure to find it.
                            while (true) {
                                // split it into 2 substrings
                                std::pair<std::string, std::string> innerParts = splitString(concatenatedWords);    

                                // if it is in the first substring
                                if (sendMessage(innerParts.first)) {                
                                    concatenatedWords = innerParts.first;
                                    continue;
                                }

                                // if it is in the second substring
                                else if (sendMessage(innerParts.second)) {            
                                    concatenatedWords = innerParts.second;
                                    continue;
                                }

                                // if it is in none of those substrings then look at the end of first substring and beginning of second substring combined
                                else {                                              
                                    size_t length = innerParts.first.length();
                                    std::string endString = innerParts.first.substr(length - 4);
                                    std::string startString = innerParts.second.substr(0, 4);

                                    // put the end of first substring and beginning of second  
                                    // substring together
                                    std::string realString = endString + startString;   
    
                                    // go through all 5 possibilitys
                                    for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {        
                                        std::string substring = realString.substr(i, wordLength);

                                        // if we found the forbidden word return it
                                        if (sendMessage(substring)) {       
                                            return substring;
                                        }
                                    }
                                }
                            }
                        }
                        // if forbidden word is not in there clear concatenatedWords and try again
                        concatenatedWords = ""; 
                    }
                }
            }
        }
    }

    return ""; // return empty string if word is not found
}

I would be very grateful if someone can help me find the mistake in my code. I personally think my error is somewhere in this section:

 // if it is in none of those substrings then look at the end of first substring and beginning of second substring combined
else {                                             
    size_t length = innerParts.first.length();
    std::string endString = innerParts.first.substr(length - 4);
    std::string startString = innerParts.second.substr(0, 4);

    // put the end of first substring and beginning of second substring together
    std::string realString = endString + startString;       

    // go through all 5 possibilitys
    for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {        
        std::string substring = realString.substr(i, wordLength);

        / if we found the forbidden word return it
        if (sendMessage(substring)) {       /
            return substring;
        }
    }

Solution

  • You've implemented binary search, which is a good idea.

    There are a few issues though:

    This leads to the following code:

    const int wordLength = 4; // Used in the two functions below
    
    // Isolate the binary search logic in a separate function
    std::string binarySearch(std::string concatenatedWords) {
        while (true) {
            // split it into 2 substrings
            std::pair<std::string, std::string> innerParts = splitString(concatenatedWords);
            // if it is in the first substring
            if (sendMessage(innerParts.first)) {                
                concatenatedWords = innerParts.first;
                continue;
            }
            // if it is in the second substring
            if (sendMessage(innerParts.second)) {            
                concatenatedWords = innerParts.second;
                continue;
            }
            else {
                size_t length = innerParts.first.length();
                // Fix to avoid a negative argument in substr. 
                // Fix to use the wordLength constant.
                std::string endString = innerParts.first.substr(length >= wordLength
     ? length - wordLength + 1 : 0);
                std::string startString = innerParts.second.substr(0, wordLength - 1);
        
                std::string realString = endString + startString;   
    
                // Fix to use the wordLength constant
                for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {        
                    std::string substring = realString.substr(i, 4);
                    if (sendMessage(substring)) {       
                        return substring;
                    }
                }
            }
        }
        return "";
    }
    
    
    std::string findBannedWord() {
        std::string concatenatedWords;
    
        for (char a = 'a'; a <= 'z'; ++a) {
            for (char b = 'a'; b <= 'z'; ++b) {
                for (char c = 'a'; c <= 'z'; ++c) {
                    for (char d = 'a'; d <= 'z'; ++d) {
                        std::string potentialWord = { a, b, c, d };
                        // Fixed comparison:
                        if (concatenatedWords.length() <= (10000 - wordLength)) {
                            concatenatedWords += potentialWord;
                        }
                        else if (concatenatedWords.length() == 10000) {
                            if (sendMessage(concatenatedWords)) {
                                return binarySearch(concatenatedWords);
                            }
                            concatenatedWords = "";
                        }
                    }
                }
            }
        }
        // Also look in the "left over"
        return binarySearch(concatenatedWords); 
    }
    

    There is room for optimisation because it is possible to use shorter strings. Potentially at every single index you could have a new 4-letter word starting. Almost every 4-letter word now appears multiple times in your concatenated string, and this is a waste of space, leading to a few more calls of the sendMessage function in the worst case than might be strictly necessary.