I have a list of names and IDs (50 entries)
cat input.txt
name ID
Mike 2000
Mike 20003
Mike 20002
And there is a huge zipped file (13GB)
zcat clients.gz
name ID comment
Mike 2000 foo
Mike 20002 bar
Josh 2000 cake
Josh 20002 _
My expected output is
NR name ID comment
1 Mike 2000 foo
3 Mike 20002 bar
each $1"\t"$2
of clients.gz is a unique identifier. There might be some entries from input.txt
that might be missing from clients.gz
. Thus, I would like to add the NR column to my output to find out which are missing. I would like to use zgrep. awk takes a very long time (since I had to zcat
for uncompress the zipped file I assume?)
I know that zgrep 'Mike\t2000'
does not work. The NR issue I can fix with awk FNR I imagine.
So far I have:
awk -v q="'"
'
NR > 1 {
print "zcat clients.gz | zgrep -w $" q$0q
}' input.txt |
bash > subset.txt
$ cat tst.awk
BEGIN { FS=OFS="\t" }
{ key = $1 FS $2 }
NR == FNR { map[key] = (NR>1 ? NR-1 : "NR"); next }
key in map { print map[key], $0 }
$ zcat clients.gz | awk -f tst.awk input.txt -
NR name ID comment
1 Mike 2000 foo
3 Mike 20002 bar