c

What happens when the second parameter of `fgets` exceeds the buffer size in C?


I am learning C and I quickly encountered my first issue in this tutorial. Here's the code:

#include <stdio.h>
#include <stdbool.h>

static char buffer[10];

int main(int argc, char **argv) {
    puts("Lispy VERSION 0.0.0.0.1");
    puts("Press Ctrl+c to exit\n");

    while (true) {
        fputs("Lispy> ", stdout);

        fgets(buffer, 2048, stdin);

        printf("No you are a %s", buffer);
    }

    return 0;
}

In this code, I wanted to experiment with what happens if the second parameter max_count of the fgets function exceeds the size of the buffer. To my surprise, the program ran perfectly without any errors, and the compiler didn't give me any warnings about potential issues.

I used the debug feature in my editor and found that if I input a string like HELLLOOOOOOOOOOOOOOOOOOOOOOO, the memory situation looks like this:

enter image description here

As shown in the screenshot above, it seems that the entire input string is stored correctly in memory! It also seems that the size of buffer is not exactly what I specified (10 bytes). If it were 10 bytes, the line printf("No you are a %s", buffer); should only print 10 bytes, but it printed the entire string "HELLLOOOOOOOOOOOOOOOOOOOOOOO" instead.

Does this mean I have actually accessed memory beyond the allocated size of the buffer?


Solution

  • By and large, the C language is a mechanism for doing what the programmer says and not for providing checks that they are doing it correctly.

    One reason for this is that, when C was being developed, programming languages were relatively new, and every feature implemented required considerable amounts of work. Somebody had to have the ideas, plan the solution, implement algorithms, and write the code. So little work was done to ensure programs did not write beyond the ends of arrays. That was the job of the programmer.

    Another reason is that a desirable feature of C is that it is fast: It does almost no work beyond what is absolutely necessary for a program to work. Adding run-time checks for things that a programmer should have done when writing the program is wasteful, since it executes many times what should have been done once.

    So, the C standard says what happens when a program uses fgets in ways that do not overflow the buffer passed to it. It does not say what happens when a program does overflow the buffer. In the terminology of the C standard, that is undefined behavior: The standard does not specify anything about what must happen.

    When fgets is used as you have shown, the most common consequence is that it writes beyond the array. What happens then depends on the program. Writing beyond the array overwrites other data in the program. In a simple program such as you have tried, with little in it, nothing else was immediately affected. In a program with more components, writing beyond the array may have disrupted other things, and the program will crash or otherwise misbehave.

    If it were 10 bytes, the line printf("No you are a %s", buffer); should only print 10 bytes…

    Nothing tells printf how big buffer is. When printf sees %s, it detects the end of the string by looking for a null byte. Since your fgets wrote a string exceeding the array size, printf followed the bytes, printing each one until it found a null byte. There was a null byte after the string you entered because fgets puts a null byte after the characters it reads.

    This a simple manifestation of undefined behavior. You should be aware that compilers have become sophisticated and, when optimization is turned on, they may transform programs in ways that are surprising at first, so undefined behavior can manifest in surprising ways.

    Second, shouldn't the compiler be able to detect this kind of undefined behavior?

    Some compilers do detect this particular problem. However, it is not possible for compilers to detect all such errors. Quite often, the fgets would not be performed directly in code where the declaration of the buffer is visible. Often, the buffer would be passed to some other subroutine, and that subroutine would call fgets. In the code visible to the compiler, it would not be apparent that the length being used was longer than the length of the buffer.