bashwhile-loopposixshunix-head

'while head -n 1' curiosities


Some coding experiments, (made while attempting to find a shorter answer to a coding question), led to a few interesting surprises:

seq 2 | while head -n 1 ; do : ; done

Output (hit Control-C or it'll waste CPU cycles forever):

1
^C

The same, but using a redirected input file instead of piped input:

seq 2 > two
while head -n 1 ; do : ; done < two

Output (hit Control-C):

1
2
^C

Questions:

  1. Why does the while loop not stop the way seq 2 | head -n 1 would?

  2. Why would redirected input produce more output than piped input?


The above code was tested with dash and bash on a recent Lubuntu. Both seq and head are from the coreutils (version 8.25-2ubuntu2) package.

Method to get around having to hit (Ctrl-C):

timeout .1 sh -c "seq 2 > two ; while head -n 1 ; do : ; done < two"

1
2

timeout .1 sh -c "seq 2 | while head -n 1 ; do : ; done"

1


Solution

  • head -n 1, when given an empty stream on stdin, is well within its rights and specification to immediately exit with a successful exit status.

    Thus:

    seq 2 | while head -n 1 ; do : ; done
    

    ...can legally loop forever, as head -n 1 is not required to exit with a nonzero status and thus terminate the loop. (A nonzero exit status is only required by the standard if "an error occurred", and a file having fewer lines than are requested for output is not defined as an error).

    Indeed, this is explicit:

    When a file contains less than number lines, it shall be copied to standard output in its entirety. This shall not be an error.


    Now, if your implementation of head, after its first invocation, (printing the contents of the first line), leaves the file pointer queued up at the beginning of the second line when it exits, (which it is absolutely not required to do), then the second loop instance will then read that second line and emit it. Again, however, this is an implementation detail which depends on whether the folks writing your head implementation chose to either:

    1. Read an aggressively large block, but only emit a subset of it. (The more efficient implementation.)
    2. Or loop character-by-character to only consume a single line.

    An implementer is well within their rights to decide which of those implementations to follow based on criteria only available at runtime.


    Now, let's say your head always tries to read 8kb blocks at a time. How, then, could it ever leave the pointer queued up for the second line? [* - other than seeking backwards, which some implementations do when given a file, but which is not required by the standard; thanks to Rob Mayhoff for the pointer here]

    This can happen if the concurrent invocation of seq has only written and flushed a single line as of when the first read occurs.

    Obviously, it's a very timing-sensitive situation -- a race condition -- and also depends on unspecified implementation details, (whether seq flushes its output between lines -- which, as seq is not specified as part of POSIX or any other standard, is completely variant between platforms).