cscanfwhitespaceconversion-specifier

The times when we have to use `scanf("\n")` and `scanf("%*c")`


In C, when we use a scanf("%s") or scanf("%c") before a scanf("%s") or scanf("%[^\n]"), while giving in the input, there will be an extra '\n' at the end of the first input and that will be passed into the second input and mess up the input stream. I've tried these codes in two different systems with the same gcc compiler as far as I know. But it did different things each time. In the first system I had to use a scanf("\n") to discard the newline character. But in the second system there was no such need. It discarded the newline character automatically.

Then I tried three codes,

code 1:

printf("Enter the name of the student: ");
scanf("%s", name);

printf("Enter the email of the student: ");
scanf("%s", email);

Here I didn't have to ignore the newline character, I had no issues with the compiler.

code 2:

printf("Enter the name of the student: ");
scanf("%s", name);
scanf("%*c");

printf("Enter the email of the student: ");
scanf("%s", email);

here I just added the scanf("%*c") to discard the newline character, this had the same result as code 1. But when I replace scanf("%*c") with scanf("\n") the input is messed up again where I have to give 3 inputs now because it doesn't properly discard the first newline character.

code 3:

char chr, s[100], sen[100];

scanf("%c", &chr);
scanf("\n");
scanf("%s", s);
scanf("\n");
scanf("%[^\n]%*c", sen);

Here the code doesn't work without the scanf("\n")s.

I don't really understand what's happening here. Why is it that sometimes I have to escape the newline character and other times it's not required.


Solution

  • In C, when we use a scanf("%s") or scanf("%c") before a scanf("%s") or scanf("%[^\n]"), while giving in the input, there will be an extra '\n' at the end of the first input and that will be passed into the second input and mess up the input stream.

    I would not put it that way. I would say:

    Whenever we use scanf with any format specifier, the \n indicating the end of the line the user typed remains on the input stream where it may cause later problems. There are three cases:

    1. If the next input is scanf using any format specifier other than %c or %[…], the lingering \n will automatically be skipped along with any other leading whitespace.
    2. If the next input is scanf using %c or %[…], the \n will be read by that format specifier (although that is usually not desired).
    3. If the next input is something other than scanf, such as fgets or getchar, the \n will be read by that function (although that is usually not desired).

    (Or perhaps more concisely: "If we use scanf("%c") or scanf("%[…]") after having previously called scanf on something else, the leftover \n tends to mess things up.")

    Situation #2 is easily fixed by adding an extra leading space to the format specifier, explicitly instructing scanf to skip leading whitespace (as it implicitly does for all other format specifiers).

    Calling scanf("\n") by itself is not usually recommended, as it doesn't do what it appears to do.

    These rules aren't the complete story on scanf, but they're a good start and explain the behavior in most basic uses.

    I've tried these codes in two different systems with the same gcc compiler as far as I know. But it did different things each time.

    That is surprising. If you could describe that difference exactly and reproducibly, it might be interesting to explore.

    In your code 1, you're reading two simple strings, so as long as they don't contain spaces, you're fine. There's a \n left over after scanf reads the first string, but it's skipped by the second %s.

    In your code 2, the call to scanf("%*c") reads and discards the \n. This is indeed one way to explicitly discard a stray newline, but I don't believe it's the best way. (The problems are that it will read any character — not necessarily just a newline — and if you call it at a spot where there isn't a stray character to consume, it will block, waiting for a character it can consume.)

    In your code 3, you say "the code doesn't work without the scanf("\n")s", and that's partly true. After reading a string with %s, there's a \n still on the input stream. So the next %[^\n] is prematurely satisfied by that stray \n. So your call to scanf("\n") is one way to strip it. (But you didn't need to call scanf("\n") between the %c and the %s.)

    As an aside, rather than calling scanf("\n"), I believe it would have been cleaner to just add a leading space to the following %[…] specifier: scanf(" %[^\n]%*c", sen);.

    (Also you don't necessarily need that %*c out at the end. I know what it's for, but I'd say you don't or shouldn't need it. More on this below.)


    The bottom line is that scanf is a pretty highly problematic input function. It's nice and simple to use for really simple inputs, in simple, beginning programs. But it's not much good — it's a frustrating nuisance, generally more trouble than it's worth — for anything fancy.

    What do I mean by "really simple inputs"? It's a pretty short list:

    And that's it. If you want to do anything fancier than that, scanf starts turning into a pumpkin. In particular:

    And, as a more general rule, if you're trying to do something with scanf, and it's not working, and you come to believe that you need to "flush" some unwanted input, stop right there. The very fact that the word "flush" has entered your mind is an almost perfectly reliable indicator that scanf is not the right tool for the job you're trying to do, and you should strongly consider abandoning scanf in favor of something else. (Whatever it is that you're trying to do, although it might be barely possible to do using scanf and some clumsy input-flushing mechanism, it's just about guaranteed to be harder to do, and work less well, than if you used something else.)

    See here and here for some similar sets of guidelines on what to try to do, versus not do, using scanf. And when you're ready to try something else, read What can I use for input conversion instead of scanf?.


    Finally, a few more words about your call to scanf("%[^\n]%*c", sen);, and specifically that extra %*c specifier at the end. I said I thought you didn't need it. Here's why.

    What does that %*c do? Well, %[^\n] reads a line of text up to but not including a newline, so %*c reads and discards the newline, presumably so that it "won't stick around on the input stream and cause problems later". But let's think about this a little more.

    When you get right down to it, scanf is always leaving newlines on the input stream. It's the default behavior with scanf. And, if the next call is careful to always skip the stray newline, everything works out fine.

    Suppose you call

    scanf("%d", &num1);
    scanf("%d", &num2);
    scanf("%d", &num3);
    

    There's a newline left over after scanning num1, and there's a newline left over after scanning num2, but neither of them cause any problems, because the next %d cleanly skips it.

    It's not really a problem that there's always a stray newline after a scanf call, unless the next input call doesn't skip it. That happens if (a) the next input call is scanf using %c or %[…], or (b) the next input call isn't scanf at all, but is rather something like fgets or getchar.

    So one way (and IMO the best and cleanest way) of dealing with stray newlines with scanf is just to declare that it's always the next call's job to skip them:

    1. If the next call involves a specifier like %d or %f or %s, it automatically skips leading whitespace, so we're fine.
    2. If the next call involves the specifiers %c or %[…], add an explicit leading space to force a stray newline to be skipped: " %c" or " %[…]".
    3. If the next call isn't scanf... well, that's a bad idea. My recommendation is to never mix scanf with other input functions in the same program.

    There's the concept of a loop invariant, which is something useful you can say that's true for each trip through a loop, like "the variable i is always the index of the next cell we'll fill in" or "len is always the length of the string built so far."

    Similarly, when reading input a line at a time, it's useful to have a convention for how the newlines are handed. We could have either:

    1. "As each line of input is read, the \n is read also. The input stream is left positioned at the beginning of the next line to be read." Or,
    2. "After reading each line of input, the \n is left on the input stream. It is always the job of the next line-reading function to first skip past that \n."

    Both if these conventions are consistent. If you're reading lines of input using fgets, you're obviously using convention #1. But if you're reading input using scanf, well, scanf is always mostly happy with convention #2.

    And I believe, for sanity's sake, that if you're using scanf, you should embrace convention #2 wholeheartedly. Don't try to strip the \n after reading a line. Don't use locutions like %*c after %[…] so that "the newline won't cause problems later". Just let the stray newlines linger, and make sure that you always skip them next time. %d and %f and %s do that automatically. %c and %[…] do that, in effect, if you explicitly add a leading space. And don't try to use fgets or getchar in a program where you're also using scanf. (Or, if you must intermix scanf and fgets, then right before calling fgets, do something to skip the newline if it's there. That's one place where a call to scanf("\n") might be warranted.)