Some coding experiments, (made while attempting to find a shorter answer to a coding question), led to a few interesting surprises:
seq 2 | while head -n 1 ; do : ; done
Output (hit Control-C or it'll waste CPU cycles forever):
1
^C
The same, but using a redirected input file instead of piped input:
seq 2 > two
while head -n 1 ; do : ; done < two
Output (hit Control-C):
1
2
^C
Questions:
Why does the while
loop not stop the way seq 2 | head -n 1
would?
Why would redirected input produce more output than piped input?
The above code was tested with dash
and bash
on a recent Lubuntu. Both seq
and head
are from the coreutils (version 8.25-2ubuntu2) package.
Method to get around having to hit (Ctrl-C):
timeout .1 sh -c "seq 2 > two ; while head -n 1 ; do : ; done < two"
1
2
timeout .1 sh -c "seq 2 | while head -n 1 ; do : ; done"
1
head -n 1
, when given an empty stream on stdin, is well within its rights and specification to immediately exit with a successful exit status.
Thus:
seq 2 | while head -n 1 ; do : ; done
...can legally loop forever, as head -n 1
is not required to exit with a nonzero status and thus terminate the loop. (A nonzero exit status is only required by the standard if "an error occurred", and a file having fewer lines than are requested for output is not defined as an error).
Indeed, this is explicit:
When a file contains less than number lines, it shall be copied to standard output in its entirety. This shall not be an error.
Now, if your implementation of head
, after its first invocation, (printing the contents of the first line), leaves the file pointer queued up at the beginning of the second line when it exits, (which it is absolutely not required to do), then the second loop instance will then read that second line and emit it. Again, however, this is an implementation detail which depends on whether the folks writing your head
implementation chose to either:
An implementer is well within their rights to decide which of those implementations to follow based on criteria only available at runtime.
Now, let's say your head
always tries to read 8kb blocks at a time. How, then, could it ever leave the pointer queued up for the second line? [* - other than seeking backwards, which some implementations do when given a file, but which is not required by the standard; thanks to Rob Mayhoff for the pointer here]
This can happen if the concurrent invocation of seq
has only written and flushed a single line as of when the first read
occurs.
Obviously, it's a very timing-sensitive situation -- a race condition -- and also depends on unspecified implementation details, (whether seq
flushes its output between lines -- which, as seq
is not specified as part of POSIX or any other standard, is completely variant between platforms).