In a complex script I am using grep
to get matched lines using a pattern file
For example: Here is the file containing text
$ cat file.txt
abc$(SEQ)asdasd
wwww$(SEQ)asqqqqqq
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz
klmn$(SEQ)11111111
op$(SEQ)44444444
qrs$(SEQ)777
tuv$(SEQ)mmmmmmmmm
qrs$(SEQ)777444
asdsd777hdhfgjdfasd
wxyzfhdfghdfh
and here is the pattern file
$ cat pattren.txt
444
777
asd
I am using the following grep
command to get the matched lines
On the command line I can see what pattern is matched but not on the logs when it is logged. So I need a way to print the the matched line and the pattern that got matched. The output should look something like this. Pattern printed after TAB (or any recognizable format)
abc$(SEQ)asdasd <TAB> asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz <TAB> asd
op$(SEQ)44444444 <TAB> 444
qrs$(SEQ)777 <TAB> 444
qrs$(SEQ)777444 <TAB> 777444
asdsd777hdhfgjdfasd <TAB> asd777
I can use grep with -o
but I am not able to combine both (i.e. with and without -o
) together.
It is not necessary to use grep
, I am happy to use any other commands that can accomplish this.
One awk
idea:
awk '
BEGIN { sep1 = "\t"; sep2 = "," } # predefine our separators; modify as desired
FNR==NR { ptns[$0]; next } # 1st file: save each line as a new index in our ptns[] array
{ sfx = "" # 2nd file: reset our suffix
for (ptn in ptns) # loop through the indices (aka patterns) of the ptns[] array
if (index($0,ptn)) # if the pattern exists in the current line (ie, index() returns a value > 0) then ...
sfx = sfx (sfx == "" ? "" : sep2) ptn # append the pattern to our suffix
if (sfx != "") # if the suffix is not blank then we found at least one match so ...
print $0 sep1 sfx # print current line and append the suffix
}
' pattern.txt file.txt
Alternatively, place the body of the awk
script in a file and then access via awk -f ...
:
$ cat my_grep.awk
BEGIN { sep1 = "\t"; sep2 = "," }
FNR==NR { ptns[$0]; next }
{ sfx = ""
for (ptn in ptns)
if (index($0,ptn))
sfx = sfx (sfx == "" ? "" : sep2) ptn
if (sfx != "")
print $0 sep1 sfx
}
$ awk -f my_grep.awk pattern.txt file.txt
NOTES:
patterns.txt
do not have any leading/trailing white space which would cause the index()
call to fail(ptn in ptns)
does not guarantee the order in which the patterns are processed which means there's no guarantee of the ordering of said patterns when printed at the end of the line; while additional code could be added to address an ordering requirement, OP would need to provide more details to include how to handle duplicate and/or overlapping patterns (eg, a
and as
would match at the same index()
position so which pattern would be considered the actual match?)index()
will only find the 1st occurrence of a pattern, and we make no attempt to match beyond that first match, this approach only tells us that there is at least one match; additional coding would be needed to determine the number of matches but would also require additional details from OP on how to process duplicate and/or overlapping patterns (eg, how many times do 4
and 44
match against 44444444
?)Both approaches generate:
abc$(SEQ)asdasd asd
efg hij$(SEQ)asdasdasd$(SEQ)zzzzzz asd
op$(SEQ)44444444 444
qrs$(SEQ)777 777
qrs$(SEQ)777444 444,777
asdsd777hdhfgjdfasd asd,777