linuxawk

Simple awk command issue (FS, OFS related)


I tried to reorganize the format of a file containing:

>Humanl|chr16:86430087-86430726 | element 1 | positive
>Humanl|chr16:85620095-85621736 | element 2 | negative
>Humanl|chr16:80423343-80424652 | element 3 | negative
>Humanl|chr16:80372593-80373755 | element 4 | positive
>Humanl|chr16:79969907-79971297 | element 5 | negative
>Humanl|chr16:79949950-79951518 | element 6 | negative
>Humanl|chr16:79026563-79028162 | element 7 | negative
>Humanl|chr16:78933253-78934686 | element 9 | negative
>Humanl|chr16:78832182-78833595 | element 10 | negative

My command is:

awk '{FS="|";OFS="\t"} {print $1,$2,$3,$4,$5}'

Here is the output:

>Human|chr16:86430087-86430726  |      element 1      |
>Human  chr16:85620095-85621736         element 2      negative
>Human  chr16:80423343-80424652         element 3      negative
>Human  chr16:80372593-80373755         element 4      positive
>Human  chr16:79969907-79971297         element 5      negative
>Human  chr16:79949950-79951518         element 6      negative
>Human  chr16:79026563-79028162         element 7      negative
>Human  chr16:78933253-78934686         element 9      negative
>Human  chr16:78832182-78833595         element 10     negative

Every line works fine except for the first line. I don't understand why this happened.

Can someone help me with it? Thanks!


Solution

  • Short answer

    FS and OFS are set too late to affect the first line, use something like this instead:

    awk '{print $1,$2,$3,$4,$5}' FS='|' OFS='\t'
    

    You can also use this shorter version:

    awk -v FS='|' -v OFS='\t' '$1=$1'
    

    A bit longer answer

    It doesn't work because awk has already performed record/field splitting at the time when FS and OFS are set. You can force a re-splitting by setting $0 to $0, e.g.:

    awk '{FS="|";OFS="\t";$0=$0} {print $1,$2,$3,$4,$5}'
    

    The conventional ways to do this are 1. set FS and others in the BEGIN clause, 2. set them through the -v VAR=VALUE notation, or 3. append them after the script as VAR=VALUE. My preferred style is the last alternative:

    awk '{print $1,$2,$3,$4,$5}' FS='|' OFS='\t'
    

    Note that there is a significant difference between when -v and post-script variables are set. -v will set variables before the BEGIN clause whilst post-script setting of variables are set just after the BEGIN clause.