awk

How to feed into an AWK program a series of field names and have the AWK program check files for the presence of the field name?


field-names.txt contains a list of field names:

AB_CODE
ACFT_CODE
AC_TYPE
ADD_INFO
AKA
ALT
ALT1_DESC
ALT2_DESC
ALT3_DESC

For each field name I want to print the files whose first line contains this field name (a space-separated list of fields). Here's what I tried:

At a bash command line I entered this:

cat field-names.txt | awk 'BEGIN { getline fieldname; print fieldname }
        
NR == 1 && $0 ~ /fieldname/ { print FILENAME }' **/*.TXT

That produces the wrong result. What is the correct way to do this?


Solution

  • This awk solution should work for you:

    awk 'FNR == NR {
       rx = (rx == "" ? "" : rx "|") $1
       next
    }
    FNR == 1 && " " $0 " " ~ " (" rx ") " {
       print FILENAME
    }' field-names.txt **/*.TXT
    

    First we build a regex with | between each line of field-names.txt in the first block of FNR == NR. Then we use that regex to match against each first line using that regex. We prefix and suffix each first line and regex with space to make sure we only match whole word not the partial ones.


    For the sake of optimization we can do like this to construct full regex only once:

    awk 'FNR == NR {
       rx = (rx == "" ? "" : rx "|") $1
       pNR = NR
       next
    }
    NR == pNR+1 {
       rx = " (" rx ") "
    }
    FNR == 1 && " " $0 " " ~ rx {
       print FILENAME
       nextfile
    }' field-names.txt **/*.TXT