awkgawk

Why does awk "not in" array work just like awk "in" array?


Here's an awk script that attempts to set difference of two files based on their first column:

BEGIN{
    OFS=FS="\t"
    file = ARGV[1]
    while (getline < file)
        Contained[$1] = $1
    delete ARGV[1]
    }
$1 not in Contained{
    print $0
}

Here is TestFileA:

cat
dog
frog

Here is TestFileB:

ee
cat
dog
frog

However, when I run the following command:

gawk -f Diff.awk TestFileA TestFileB

I get the output just as if the script had contained "in":

cat
dog
frog

While I am uncertain about whether "not in" is correct syntax for my intent, I'm very curious about why it behaves exactly the same way as when I wrote "in".


Solution

  • I cannot find any doc about element not in array.

    Try !(element in array).


    I guess: awk sees not as an uninitialized variable, so not is evaluated as an empty string.

    $1 not == $1 "" == $1