I have to count the number of records I have in 6 files, each file contains 4 million records (the count should be as fast as possible), however there is another file with a similar name which should be omitted.
fileSales_1.txt (4 million records)
fileSales_2.txt (4 million records)
fileSales_3.txt (4 million records)
fileSales_4.txt (4 million records)
fileSales_5.txt (4 million records)
fileSales_6.txt (4 million records)
fileSales_unique.txt (24 million records)
I'm counting the logs with the following command: awk 'END {pint NR}' fileSales_*.txt
However, in doing so, the fileSales_unique.txt archive also counts, giving a total of 48 million records
Could you help me with an instruction which only counts the number of records for files 1 to 6? The result should be 24 million records, awk 'END {pint NR}' fileSales_(1 to 6).txt
Suppose you have these files (using wc
to show both file names and size):
4000000 fileSales_1.txt
4000000 fileSales_2.txt
4000000 fileSales_3.txt
4000000 fileSales_4.txt
4000000 fileSales_5.txt
4000000 fileSales_6.txt
24000000 fileSales_unique.txt
24000000 fileSales_unique_also.txt
72000000 total
There are many ways to achieve your goal, but two primary ones:
Inclusion glob:
wc -l fileSales_{1..6}.txt
wc -l fileSales_?.txt
wc -l fileSales_[1-6].txt
Any of those:
$ wc -l fileSales_[1-6].txt
4000000 fileSales_1.txt
4000000 fileSales_2.txt
4000000 fileSales_3.txt
4000000 fileSales_4.txt
4000000 fileSales_5.txt
4000000 fileSales_6.txt
24000000 total
(Same concept applies to awk
)
Or, maintain a skip
array in Bash:
skip=( *_unique* )
to_cnt_files=()
for fn in fileSales*.txt; do
[[ "${skip[@]/$fn/}" != "${skip[@]}" ]] && continue
to_cnt_files+=( "$fn" )
done
Then your method works:
awk 'END{print NR}' $(printf "%s\n" "${to_cnt_files[@]}")
# 24000000
Know that wc
in this case will be monumentally faster than awk likely...