See the following two commands with output:
$ wc < myfile.txt
4 4 34
$ cat myfile.txt | wc
4 4 34
My understanding is that these two both connect the stdin of the wc
process with the content stream of myfile.txt
. But why is the output padded in one case, and not in the other? How does wc tell the difference between the two? Is it not just reading from stdin?
Short answer: because with wc < myfile.txt
, the wc
program has direct access to the file, and can do things besides reading from it. Specifically, it can get the file's size (and it bases the output column width on that). With cat myfile.txt | wc
, it can't do that, so it uses wide columns to make sure there's enough room.
Long answer: wc
tries to provide nicely columnated output:
$ wc a.txt b.txt
6 6 88 a.txt
60 236 1772 b.txt
66 242 1860 total
In order to estimate how wide its columns need to be, the GNU version of wc
runs stat()
(or fstat()
) on all of its input files (before actually reading them to get the detailed counts), and uses their sizes to determine how large the word/line/character counts might get, and hence how wide it might need to make the columns to have room for all those digits.
If it can't get any of the input files' sizes (e.g. because they're not plain files, but pipes or something similar), it "assumes the worst", and forces a minimum width of 7 digits. So anytime any of the inputs are pipes or anything like that, you're going to get at-least-7-character-wide columns.
Some examples:
# direct input via stdin
$ wc a.txt - <b.txt
6 6 88 a.txt
60 236 1772 -
66 242 1860 total
# indirect input via cat and a pipe on stdin
$ cat b.txt | wc a.txt -
6 6 88 a.txt
60 236 1772 -
66 242 1860 total
# direct via file descriptor #4
$ wc a.txt /dev/fd/4 4<b.txt
6 6 88 a.txt
60 236 1772 /dev/fd/4
66 242 1860 total
# indirect input via cat and a pipe on FD #63
$ wc a.txt <(cat b.txt)
6 6 88 a.txt
60 236 1772 /dev/fd/63
66 242 1860 total