awk

Why does GNU AWK sub function not act on the selected field in this case?


AWK recognises the field value "b" in this example:

$ printf "a ab    c d b" | awk '{for (i=1;i<=NF;i++) print $i}'
a
ab
c
d
b
$ printf "a ab    c d b" | awk '{for (i=1;i<=NF;i++) if ($i=="b") print $i}'
b

Note the 3 spaces before the "c". If I try to replace the field with value "b" with "X", the sub replacement happens on the first appearance of "b":

$ printf "a ab c d b" | awk '{for (i=1;i<=NF;i++) if ($i=="b") sub($i,"X"); print}'
a aX   c d b

Is it possible to replace "b" with "X" in the field containing only "b", and also not change the spacing between fields? (Keep the 3 spaces before the "c".)


Solution

  • Your code:

    $ printf "a ab    c d b" |
    awk '{for (i=1;i<=NF;i++) if ($i=="b") sub($i,"X"); print}'
    a aX    c d b
    

    fails to change the field you want because you didn't tell it the field name to change when you called sub(), e.g. if ($i=="b") sub($i,"X",$i), and so it's operating on the whole record, but that would fail anyway if the string you wanted to match/modify was a regexp metachar so you should have been trying to do if ($i=="b") sub(/.*/,"X",$i) or better simply if ($i=="b") $i="X".

    None of that would address the other, harder to handle, part of your question, though:

    and also not change the spacing between fields? (Keep the 3 spaces before the "c".)

    Read understanding-how-ofs-works-in-awk for background but you have 2 choices:

    1. Change the record, not 1 field of the record, or
    2. Save the spacing between fields, then change the field, then restore the spacing.

    The first one would be this using any POSIX awk:

    $ printf "a ab    c d b" |
    awk '
        match(" "$0" ", /[[:space:]]b[[:space:]]/) {
            $0 = substr($0,1,RSTART-1) "X" substr($0,RSTART+RLENGTH-2)
        }
        { print }
    '
    a ab    c d X
    

    while the second would be this using GNU awk for the 4th arg to split() (you can do the same in any POSIX awk with a while(match($0,[^[:space:]]+)) or similar loop finding the fields but it takes more code):

    $ printf "a ab    c d b" |
    awk '
        {
            nf = split($0, flds, FS, seps)
            rec = seps[0]
            for (i=1; i<=nf; i++) {
                rec = rec (flds[i]=="b" ? "X" : flds[i]) seps[i]
            }
            $0 = rec
            print
        }
    '
    a ab    c d X
    

    FWIW for clarity, simplicity, robustness (it's using all literal string operations on the input data), and flexibility (easy to make a regexp instead of string comparison if necessary) I'd use that last script if I really had to do this job.