c++string

C++ std::string questionable usage


I am new to C++. In order to solve an online puzzle, I needed to extract numbers from strings and keep only the first and last number. For example 4nfhfbk56khfkvh => 456 => 46. So I made the function given below.

int num_extr(const std::string &thisline){
 
     std::string stored_numbers = thisline;
     int i,k,result;
 
     k = 0;
 
     for(i=0; i<thisline.size(); i++){
 
         if(thisline[i] >= '0' && thisline[i] <= '9'){
             std::cout << "Number: " << thisline[i] << '\n';
             stored_numbers[k] = thisline[i];
             ++k;
         }
     }
     std::cout << "Stored_numbers: " << stored_numbers << '\n';
     std::cout << "Stored_numbers[0]: " << stored_numbers[0] << '\n';
     std::cout << "Stored_numbers[k-1]: " << stored_numbers[k-1] << '\n';

     if(k>0){
         result = ((int)stored_numbers[0] -48) * 10 + ((int)stored_numbers[k-1] -48);
     }
     else{
         result = ((int)stored_numbers[0] -48) * 10 + ((int)stored_numbers[0] -48);
     }
     return result;
 }

As you can see, I assign stored_numbers = thisline, in order to make the string stored_numbers the same length as thisline (is this faster than using a function to find the length of thisline?). Then, I store the numbers I found on it, starting from [0]. This code works. This is an example of the output:

Initial string: trknlxnv43zxlrqjtwonect
Number: 4
Number: 3
Stored_numbers: 43knlxnv43zxlrqjtwonect
Stored_numbers[0]: 4
Stored_numbers[k-1]: 3
Extracted number: 43

But when I tried to not assign stored_numbers = thisline, so the code would be:

int num_extr(const std::string &thisline){
 
     std::string stored_numbers;
     int i,k,result;
 
     k = 0;
 
     for(i=0; i<thisline.size(); i++){
 
         if(thisline[i] >= '0' && thisline[i] <= '9'){
             std::cout << "Number: " << thisline[i] << '\n';
             stored_numbers[k] = thisline[i];
             ++k;
         }
     }
     std::cout << "Stored_numbers: " << stored_numbers << '\n';
     std::cout << "Stored_numbers[0]: " << stored_numbers[0] << '\n';
     std::cout << "Stored_numbers[k-1]: " << stored_numbers[k-1] << '\n';

     if(k>0){
         result = ((int)stored_numbers[0] -48) * 10 + ((int)stored_numbers[k-1] -48);
     }
     else{
         result = ((int)stored_numbers[0] -48) * 10 + ((int)stored_numbers[0] -48);
     }
     return result;
 }

Then the output is:

Initial string: trknlxnv43zxlrqjtwonect
Number: 4
Number: 3
Stored_numbers:
Stored_numbers[0]: 4
Stored_numbers[k-1]: 3
Extracted number: 43

So when I use stored_numbers as an array of (characters?) without the assignment, it seems like it is no longer one string, but it becomes a set of (pointers?) that cannot be presented as one string on cout... Is this idea on the right direction? Could someone please explain this behavior in more depth?


Solution

  • A way simpler solution that takes advantage of only keeping the first and last digits found in the string.

    Note: you don't need to hardcode "48" as the ascii (ordinal) value of '0'. You can simply reference this value by the literal '0' as well.

    int num_extr(const std::string &thisline){
        char first = '\0';   // null char, not literal '0'
        char last = '\0';    // null char, not literal '0'
        char* ptr = &first;
    
        for (char c : thisline) {
            if ((c >= '0') && (c <= '9')) {  // you can also use std::isdigit
               *ptr = c;
               ptr = &last;
            }
        }
    
        if (!first) {
           return 0;
        }
    
        if (!last) {
           return first - '0';
        }
    
        return 10*(first-'0') + (last-'0');
    }
    

    Now to answer your question. Let's look at this code snippet from your second code sample:

    std::string stored_numbers;
    int i,k,result;
    
    k = 0;
     
    for(i=0; i<thisline.size(); i++){
        if(thisline[i] >= '0' && thisline[i] <= '9'){
           std::cout << "Number: " << thisline[i] << '\n';
           stored_numbers[k] = thisline[i];
            ++k;
        }
    }
    

    The problem with the above is that stored_numbers does not have any pre-allocated length. So when the code reaches this line:

    stored_numbers[k] = thisline[i];

    It's really not that much different from something like this:

    int arr[3];   // valid indices are 0-2
    arr[0] = 42;  // valid
    arr[1] = 84;  // valid
    arr[2] = 126; // valid
    arr[3] = 168; // UNDEFINED BEHAVIOR
    

    Or more closer to your example:

    int arr[0];   // as odd as that looks, it's a valid array, but no index value is valid. Then length of the array is 0.
    arr[0] = 42; // UNDEFINED BEHAVIOR
    

    The moment you write at an index >= to the length of the array, you're in undefined behavior territory - with the usual effect being that you've tampered with memory belonging to another variable. Same thing applies to a string since it's effectively a container for an array. The conventional implementation of the string's [] operator is to simply return a pointer to the underlying array.

    Now for the reason why your program still "happens to work" and print 43 at the end. Most likely, but not definitively, the string has both a length member as well as some pre-allocated buffers that you didn't actually overflow (yet). So when cout << stored_numbers is executed, the string thought it's length is still zero since there were no valid append operations. So it streams no output. But when stored_numbers[0] is referenced, the string returns whatever it has in its buffer without checking the length member. That's just a guess. "Undefined behavior" also means your code could turn you into a frog. But that's up to the compiler and runtime authors to decide.