awk

Only print if the number of field is greater than a value with awk


I'm still a newbie to awk, what am I doing wrong? apologies for the poor description, I reformulate.

Goal

Only print the number of the second field if the number is > 20

lorem v3  <--- no print
ipsum v5  <--- no print
text v21  <--- print "21"
expla v12 <--- no print

My attempt that does not work

awk ' { sub("^v","",$2); if ( $2 > 20 ) print $2 } '

Solution

  • Addressing OP's question about why the current code outputs 3:

    Initially awk doesn't know if $2 is a number or a string.

    The sub() call (a string function) tells awk that $2 is to be treated as a string, which also means $2 will be treated as a string for the rest of the script.

    This leads to $2 > 20 being treated as a string comparison ('3' > '20') and since '3' (the string) is greater than '20' (the string), a 3 is output.

    To facilitate a numeric comparion we need a way to force awk to re-evaluate $2 as a numeric. One method is to add a zero, ie, $2+0. Making this one change to OP's current code:

    $ echo "lorem v3" | awk ' { sub("^v","",$2); if ( $2+0 > 20 ) print $2 } '
               <<< no output
    

    NOTE: for more details see GNU awk - variable typing


    Addressing the latest change to the question:

    Sample input:

    $ cat input.dat
    lorem v3
    ipsum v5
    text v21
    expla v12
    

    Running our awk code (additional print added for clarification) against input.dat:

    $ awk ' { print "######",$0; sub("^v","",$2); if ( $2+0 > 20 ) print $2 } ' input.dat
    ###### lorem v3
    ###### ipsum v5
    ###### text v21
    21
    ###### expla v12