I'm still a newbie to awk, what am I doing wrong? apologies for the poor description, I reformulate.
Goal
Only print the number of the second field if the number is > 20
lorem v3 <--- no print
ipsum v5 <--- no print
text v21 <--- print "21"
expla v12 <--- no print
My attempt that does not work
awk ' { sub("^v","",$2); if ( $2 > 20 ) print $2 } '
Addressing OP's question about why the current code outputs 3
:
Initially awk
doesn't know if $2
is a number or a string.
The sub()
call (a string function) tells awk
that $2
is to be treated as a string, which also means $2
will be treated as a string for the rest of the script.
This leads to $2 > 20
being treated as a string comparison ('3' > '20'
) and since '3'
(the string) is greater than '20'
(the string), a 3
is output.
To facilitate a numeric comparion we need a way to force awk
to re-evaluate $2
as a numeric. One method is to add a zero, ie, $2+0
. Making this one change to OP's current code:
$ echo "lorem v3" | awk ' { sub("^v","",$2); if ( $2+0 > 20 ) print $2 } '
<<< no output
NOTE: for more details see GNU awk - variable typing
Addressing the latest change to the question:
Sample input:
$ cat input.dat
lorem v3
ipsum v5
text v21
expla v12
Running our awk
code (additional print
added for clarification) against input.dat
:
$ awk ' { print "######",$0; sub("^v","",$2); if ( $2+0 > 20 ) print $2 } ' input.dat
###### lorem v3
###### ipsum v5
###### text v21
21
###### expla v12