bashcsvawkgrepzgrep

Big grep from txt list in .gz file logs


this is my problem (for me actually a big problem).

I have a txt file with 1.130.395 lines as below an example:

10812
10954
10963
11070
11099
10963
11070
11099
betti.bt
betti12
betti1419432307
19442407
19451970
19461949

i have like 2000 .gz log files.

I need that for every line of the .txt file a grep is performed on all .gz files.

This is an example of the contents of the gz files, an example line:

time=2019-02-28 00:03:32,299|requestid=30ed0f2b-9c44-47d0-abdf-b3a04dbb560e|severity=INFO |severitynumber=0|url=/user/profile/oauth/{token}|params=username:juvexamore,token:b73ad88b-b201-33ce-a924-6f4eb498e01f,userIp:10.94.66.74,dtt:No|result=SUCCESS
time=2019-02-28 00:03:37,096|requestid=8ebca6cd-04ee-4818-817d-30f78ee95731|severity=INFO |severitynumber=0|url=/user/profile/oauth/{token}|params=username:10963,token:1d99be3e-325f-3982-a668-30494cab9a96,userIp:10.94.66.74,dtt:No|result=SUCCESS

The txt file contains the username. I need to search in the gz files if the username is present for the url with "profile" parameters and for "result=SUCCESS".

if something is found, write to a log file only: username found; name of the log file in which it was found

It is possibile to do something? I know that i need to use zgrep command, but can someone help me....it is possibile to automate the process to let it go?

Thanks all


Solution

  • I'd just do (untested):

    zgrep -H 'url=/user/profile/oauth/{token}|params=username:.*result=SUCCESS' *.gz |
    awk -F'[=:,]' -v OFS=';' 'NR==FNR{names[$0];next} $12 in names{print $12, $1}' names.txt - |
    sort -u
    

    or probably a little more efficient as it removes the NR==FNR test for every line output by zgrep:

    zgrep -H 'url=/user/profile/oauth/{token}|params=username:.*result=SUCCESS' *.gz |
    awk -F'[=:,]' -v OFS=';' '
        BEGIN {
            while ( (getline line < "names.txt") > 0 ) {
                names[line]
            }
            close("names.txt")
        }
        $12 in names{print $12, $1}' |
    sort -u
    

    If a given user name can only appear once in a given log file or if you actually want multiple occurrences to produce multiple output lines then you don't need the final | sort -u.