bashpipestdinwc

Why does wc output different padding spaces depending on how stdin is connected?


See the following two commands with output:

$ wc < myfile.txt
 4  4 34
$ cat myfile.txt | wc
      4       4      34

My understanding is that these two both connect the stdin of the wc process with the content stream of myfile.txt. But why is the output padded in one case, and not in the other? How does wc tell the difference between the two? Is it not just reading from stdin?


Solution

  • Short answer: because with wc < myfile.txt, the wc program has direct access to the file, and can do things besides reading from it. Specifically, it can get the file's size (and it bases the output column width on that). With cat myfile.txt | wc, it can't do that, so it uses wide columns to make sure there's enough room.

    Long answer: wc tries to provide nicely columnated output:

    $ wc a.txt b.txt 
       6    6   88 a.txt
      60  236 1772 b.txt
      66  242 1860 total
    

    In order to estimate how wide its columns need to be, the GNU version of wc runs stat() (or fstat()) on all of its input files (before actually reading them to get the detailed counts), and uses their sizes to determine how large the word/line/character counts might get, and hence how wide it might need to make the columns to have room for all those digits.

    If it can't get any of the input files' sizes (e.g. because they're not plain files, but pipes or something similar), it "assumes the worst", and forces a minimum width of 7 digits. So anytime any of the inputs are pipes or anything like that, you're going to get at-least-7-character-wide columns.

    Some examples:

    # direct input via stdin
    $ wc a.txt - <b.txt
       6    6   88 a.txt
      60  236 1772 -
      66  242 1860 total
    
    # indirect input via cat and a pipe on stdin
    $ cat b.txt | wc a.txt -
          6       6      88 a.txt
         60     236    1772 -
         66     242    1860 total
    
    # direct via file descriptor #4
    $ wc a.txt /dev/fd/4 4<b.txt
       6    6   88 a.txt
      60  236 1772 /dev/fd/4
      66  242 1860 total
    
    # indirect input via cat and a pipe on FD #63
    $ wc a.txt <(cat b.txt)
          6       6      88 a.txt
         60     236    1772 /dev/fd/63
         66     242    1860 total