Counting Records in Linux Files Excluding Some Files

I have to count the number of records I have in 6 files, each file contains 4 million records (the count should be as fast as possible), however there is another file with a similar name which should be omitted.

fileSales_1.txt (4 million records)

fileSales_2.txt (4 million records)

fileSales_3.txt (4 million records)

fileSales_4.txt (4 million records)

fileSales_5.txt (4 million records)

fileSales_6.txt (4 million records)

fileSales_unique.txt (24 million records)

I'm counting the logs with the following command: awk 'END {pint NR}' fileSales_*.txt

However, in doing so, the fileSales_unique.txt archive also counts, giving a total of 48 million records

Could you help me with an instruction which only counts the number of records for files 1 to 6? The result should be 24 million records, awk 'END {pint NR}' fileSales_(1 to 6).txt

Solution

Suppose you have these files (using wc to show both file names and size):

 4000000 fileSales_1.txt
 4000000 fileSales_2.txt
 4000000 fileSales_3.txt
 4000000 fileSales_4.txt
 4000000 fileSales_5.txt
 4000000 fileSales_6.txt
 24000000 fileSales_unique.txt
 24000000 fileSales_unique_also.txt
 72000000 total

There are many ways to achieve your goal, but two primary ones:

Use a glob that only includes the desired files;
Use an exclusion list or pattern that excludes the the undesired files.

Inclusion glob:

wc -l fileSales_{1..6}.txt
wc -l fileSales_?.txt
wc -l fileSales_[1-6].txt

Any of those:

$ wc -l fileSales_[1-6].txt  
 4000000 fileSales_1.txt
 4000000 fileSales_2.txt
 4000000 fileSales_3.txt
 4000000 fileSales_4.txt
 4000000 fileSales_5.txt
 4000000 fileSales_6.txt
 24000000 total

(Same concept applies to awk)

Or, maintain a skip array in Bash:

skip=( *_unique* )
to_cnt_files=()
for fn in fileSales*.txt; do 
    [[ "${skip[@]/$fn/}" != "${skip[@]}" ]] && continue
    to_cnt_files+=( "$fn" )
done

Then your method works:

awk 'END{print NR}' $(printf "%s\n" "${to_cnt_files[@]}")
# 24000000

Know that wc in this case will be monumentally faster than awk likely...