awkmatch

awk field separator distinguish between -1, 10, and 1


Let's say I have a text file:

>>cat tmp.txt
1 1 1 1 1 -1 -1 -1 -1 -1 10 10 10 10 10

I want to find the number of unique occurrences of the number 1, -1, and 10. The following is what I have thus far:

awk -F '-1' '{print NF-1,NR}' tmp.txt | awk '{print $1}'
awk -F '10' '{print NF-1,NR}' tmp.txt | awk '{print $1}'
awk -F '1' '{print NF-1,NR}' tmp.txt | awk '{print $1}'

Where the output is 5, 5, and 15 instead of 5, 5, and 5. It appears the awk command is finding every instance of 1 for the final command. How can this be handled properly?


Solution

  • Your 1 field separator matches 1 anywhere in the string, regardless of the context.

    A valid awk approach here is to use the default whitespace field separator and count the value of those fields that are equal to 1.

    I suggest using

    awk '{a=0;for(i=1;i<=NF;i++) { if($i=="1") {a++} };print a}' tmp.txt
    

    See an online demo.

    This awk command assigns a 0 to the a variable, then iterates over all the fields (with for(i=1;i<=NF;i++) {...}) and increments a upon finding a field with the 1 value (see if($i=="1") {a++} }), then prints the a value.