The first line of each file contains field names. There may be duplicates in the field names. I want to print only the unique field names. Here's what I tried:
In a Bash file, files_and_folders.sh, I entered this:
#!/bin/bash
for file in **/*.TXT ; do
awk 'NR == 1 { for (i=1; i<=NF; i++) if (!seen[$i]) seen[$i] = 1} END { for (idx in seen) printf ("%s\n",idx) }' "${file}"
done
The Bash file ran successfully but the output contains duplicates:
AB_CODE
ACFT_CODE
AC_TYPE
ADD_INFO
AKA
ALT
ALT
ALT
ALT
ALT
ALT
ALT
ALT1_DESC
ALT2_DESC
ALT3_DESC
How to modify the AWK program (in the Bash script) to eliminate duplicates?
You must not run a loop in bash and run a new awk process for each file otherwise associative array seen will be initialized for every awk and it won't know existing entries set by previous invocations of awk.
You should do it in a single awk like this:
awk 'FNR == 1 {
for (i=1; i<=NF; ++i) {
uniques[$i]
}
}
END {
for (i in uniques)
print i
}' **/*.TXT
AC_TYPE
AKA
ALT
ADD_INFO
AB_CODE
ALT1_DESC
ALT2_DESC
ALT3_DESC
ACFT_CODE