The first line of each file contains field names. There may be duplicates in the field names. I want to print only the unique field names. Here's what I tried:
In a Bash file, files_and_folders.sh, I entered this:
#!/bin/bash
for file in **/*.TXT ; do
awk 'NR == 1 { for (i=1; i<=NF; i++) if (!seen[$i]) seen[$i] = 1} END { for (idx in seen) printf ("%s\n",idx) }' "${file}"
done
The Bash file ran successfully but the output contains duplicates:
AB_CODE
ACFT_CODE
AC_TYPE
ADD_INFO
AKA
ALT
ALT
ALT
ALT
ALT
ALT
ALT
ALT1_DESC
ALT2_DESC
ALT3_DESC
How to modify the AWK program (in the Bash script) to eliminate duplicates?
You must not run a loop in bash
and run a new awk
process for each file otherwise associative array seen
will be initialized for every awk
and it won't know existing entries set by previous invocations of awk
.
You should do it in a single awk
like this:
awk 'FNR == 1 {
for (i=1; i<=NF; ++i) {
uniques[$i]
}
}
END {
for (i in uniques)
print i
}' **/*.TXT
AC_TYPE
AKA
ALT
ADD_INFO
AB_CODE
ALT1_DESC
ALT2_DESC
ALT3_DESC
ACFT_CODE