stringbashgrep

Print matched pattern in a file along with matched lines


In a complex script I am using grep to get matched lines using a pattern file

For example: Here is the file containing text

$ cat file.txt
abc$(SEQ)asdasd
wwww$(SEQ)asqqqqqq
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz
klmn$(SEQ)11111111
op$(SEQ)44444444
qrs$(SEQ)777
tuv$(SEQ)mmmmmmmmm
qrs$(SEQ)777444
asdsd777hdhfgjdfasd
wxyzfhdfghdfh

and here is the pattern file

$ cat pattren.txt
444
777
asd

I am using the following grep command to get the matched lines

The

On the command line I can see what pattern is matched but not on the logs when it is logged. So I need a way to print the the matched line and the pattern that got matched. The output should look something like this. Pattern printed after TAB (or any recognizable format)

abc$(SEQ)asdasd <TAB> asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz  <TAB> asd
op$(SEQ)44444444    <TAB>   444
qrs$(SEQ)777    <TAB>   444
qrs$(SEQ)777444  <TAB>  777444
asdsd777hdhfgjdfasd  <TAB>  asd777 

I can use grep with -o but I am not able to combine both (i.e. with and without -o) together.

It is not necessary to use grep, I am happy to use any other commands that can accomplish this.


Solution

  • One awk idea:

    awk '
    BEGIN   { sep1 = "\t"; sep2 = "," }                       # predefine our separators; modify as desired
    
    FNR==NR { ptns[$0]; next }                                # 1st file: save each line as a new index in our ptns[] array
    
            { sfx = ""                                        # 2nd file: reset our suffix
    
              for (ptn in ptns)                               # loop through the indices (aka patterns) of the ptns[] array
                  if (index($0,ptn))                          # if the pattern exists in the current line (ie, index() returns a value > 0) then ...
                     sfx = sfx (sfx == "" ? "" : sep2) ptn    # append the pattern to our suffix
    
              if (sfx != "")                                  # if the suffix is not blank then we found at least one match so ...
                 print $0 sep1 sfx                            # print current line and append the suffix
            }
    ' pattern.txt file.txt
    

    Alternatively, place the body of the awk script in a file and then access via awk -f ...:

    $ cat my_grep.awk
    BEGIN   { sep1 = "\t"; sep2 = "," }
    FNR==NR { ptns[$0]; next }
            { sfx = ""
              for (ptn in ptns)
                  if (index($0,ptn))
                     sfx = sfx (sfx == "" ? "" : sep2) ptn
              if (sfx != "")
                 print $0 sep1 sfx
            }
    
    $ awk -f my_grep.awk pattern.txt file.txt
    

    NOTES:

    Both approaches generate:

    abc$(SEQ)asdasd asd
    efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz  asd
    op$(SEQ)44444444    444
    qrs$(SEQ)777    777
    qrs$(SEQ)777444 444,777
    asdsd777hdhfgjdfasd asd,777