windowsawkwindows-subsystem-for-linuxline-breaksdos2unix

What's wrong with repeating entries in awk print statements?


I was trying to answer this other question, about how to repeat an existing column.

I thought this to be fairly easy, just by doing something like:

awk '{print $0 $2}'

This, however, only seems to print $0.

So, I decided to do some more tests:

awk '{print $0 $0}'     // prints the entire line only once
awk '{print $1 $1 $1}'  // prints the first entry only once
awk '{print $2 $1 $0}'  // prints the first entry, followed
                        // by the entire line
                        // (the second part is not printed)
...

And having a look at the results, I have the impression that awk is more or less checking what he has printed already and refuses to print it a next time.

Why is that?

I'm using awk from my Windows subsystem for Linux (WSL), more exactly the Ubuntu app from Canonical. This is the result of awk --version:

GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
Copyright (C) 1989, 1991-2019 Free Software Foundation.

Solution

  • awk '{print $0 $0}'     // prints the entire line only once
    
    awk '{print $0 $2}'     // prints only $0
    

    All these are due to presence of DOS line break \r in your file. Due to presence of \r unix output overwrites on same line from the beginning of the line position hence both lines overlap and you get to see only one line in output.

    You can remove \r using tr or sed like this:

    tr -d '\t' < file > file.new
    sed -i.bak $'s/\\r$//' file
    

    Or you can ask awk to treat \r\n as record separator (note gnu-awk)

    awk -v RS='\r\n` '{print $0, $0}' file