cwhile-loopscanfstrtokstack-smash

strtok() sometimes(??) causing stack smashing?


Using Kubuntu 22.04 LTS, Kate v22.04.3, and gcc v11.3.0, I have developed a small program to investigate the use of strtok() for tokenising strings, which is shown below.

#include <stdio.h>
#include <string.h>

int main(void)
{
   char inString[] = "";         // string read in from keyboard.
   char * token    = "";         // A word (token) from the input string.
   char delimiters[] = " ,";     // Items that separate words (tokens).
   
   // explain nature of program.
   printf("This program reads in a string from the keyboard"
          "\nand breaks it into separate words (tokens) which"
          "\nare then output one token per line.\n");
   printf("\nEnter a string: ");
   scanf("%s", inString);
   
   /* get the first token */
   token = strtok(inString, delimiters);
   
   /* Walk through other tokens. */
   while (token != NULL)
   {
      printf("%s", token);
      printf("\n");
      
      // Get next token.
      token = strtok(NULL, delimiters);
   }
   return 0;
}

From the various web pages that I have viewed, it would seem that I have formatted the strtok() function call correctly. On the first run, the program produces the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and breaks it into separate words (tokens) which
are then output one token per line.

Enter a string: fred ,  steve ,   nick
f
ed

On the second run, it produced the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and brakes it into separate words (tokens) which
are then output one token per line.

Enter a string: steve ,  barney ,   nick
s
eve
*** stack smashing detected ***: terminated
Aborted (core dumped)

Subsequent runs showed that the program sort of ran, as in the first case above, if the first word/token contained only four characters. However, if the first word/token contained five or more characters then stack smashing occurred.

Given that "char *" is used to access the tokens, why :-

a) is the first token (in each case) split at the second character ?

b) are the subsequent tokens (in each case) not output ?

c) does a first word/token of greater than four characters cause stack smashing?

Stuart


Solution

  • The declaration

    char inString[] = "";
    

    is equivalent to:

    char inString[1] = "";
    

    This means that you are allocating an array of only a single element, so it only has space for storing a single character.

    The function call

    scanf("%s", inString);
    

    requires that the function argument inString points to a memory buffer that is sufficiently large to store the matched input. Your program is violating this requirement, as the memory buffer has only space for a single character (the terminating null character). It can therefore only store strings with a length of zero.

    By violating the requirement, your program is invoking undefined behavior, which means that anything can happen, including the strange behavior that you observed. The function scanf is probably overflowing the buffer inString, overwriting other important data on your program's stack, causing it to misbehave. This is called "stack smashing".

    To fix this, you should give the array inString more space, for example by changing the line

    char inString[] = "";
    

    to:

    char inString[200] = "";
    

    However, in that case, if the user enters more than 200 characters of input as a single word, then you will have the same problem again and your program may crash. Therefore, you may want to additionally limit the number of characters matched by scanf to 199 characters (200 including the terminating null character). That way, you can ensure that the user will not be able to crash your program.

    You can add such a limit like this:

    scanf("%199s", inString);
    

    Note, however, that the %s specifier will only match a single word. If you want to read an entire line of input, you may want to use the function fgets instead of scanf.