field-names.txt contains a list of field names:
AB_CODE
ACFT_CODE
AC_TYPE
ADD_INFO
AKA
ALT
ALT1_DESC
ALT2_DESC
ALT3_DESC
For each field name I want to print the files whose first line contains this field name (a space-separated list of fields). Here's what I tried:
At a bash
command line I entered this:
cat field-names.txt | awk 'BEGIN { getline fieldname; print fieldname }
NR == 1 && $0 ~ /fieldname/ { print FILENAME }' **/*.TXT
That produces the wrong result. What is the correct way to do this?
This awk
solution should work for you:
awk 'FNR == NR {
rx = (rx == "" ? "" : rx "|") $1
next
}
FNR == 1 && " " $0 " " ~ " (" rx ") " {
print FILENAME
}' field-names.txt **/*.TXT
First we build a regex with |
between each line of field-names.txt
in the first block of FNR == NR
. Then we use that regex to match against each first line using that regex. We prefix and suffix each first line and regex with space to make sure we only match whole word not the partial ones.
For the sake of optimization we can do like this to construct full regex only once:
awk 'FNR == NR {
rx = (rx == "" ? "" : rx "|") $1
pNR = NR
next
}
NR == pNR+1 {
rx = " (" rx ") "
}
FNR == 1 && " " $0 " " ~ rx {
print FILENAME
nextfile
}' field-names.txt **/*.TXT