awksed

Print lines that match two patterns


I have the following problem. My file looks like this:

1082016051300000010005690902BCDEΔ0204366221002201612052016-00001274448A                                                                     
1082016051300000010034397704EDFG10865125102001201626042016-000001028134
1082016051300000010068901401RADJ34835974123112201528042016-000001343290                                                                     
1082016051300000010068901401RADJ34835974103112201528042016-000000910290                                                                     
1082016051300000010095474301RADJ13453401102812201525042016-00000208995K                                                                     
1082016051300000010098429002RADJΤ1052947211312201218042016-000034021290
1032016051300000010095474301RADJ13453401102812201525042016-00000208995K                                                                     
1032016051300000010098429002RADJΤ1052947211312201218042016-000034021290

and I'm trying to print only lines that match both patterns, anywhere on the line. I want to print only lines that match two patterns, the first pattern being on columns 2:1 (08) and the second pattern being the word (RAD). I've tried to do this with grep:

grep -o '.[0-1][1-8]*RAD' FILEIN

and the only response I get is that FILEIN is a binary file. I've also tried with sed this:

sed -n '/[0-1][1-8]*RAD/p' FILEIN 

but I have a feeling the * is not expanded. I've managed to make it work by looking for two patterns in succession, like:

sed -n '/RAD/p' FILEIN | sed '/^108/p' 

and this works, but the file I'll be using as input is potentially huge, and I'm not sure that piping a stream into another is time efficient. Could someone help me? Awk or Perl are welcome too.


Solution

  • You can add an -a option to grep to force it to read a file as text.

    If the order of the patterns is fixed, the correct regex to allow zero or more characters between them is .*, not just *.

    grep -a '[0-1][1-8].*RAD' files...
    

    sed is a scripting language; you can combine multiple conditions and actions easily.

    sed -n '/regex1/!d;/regex2/p' files...
    

    (If no match on the first regex, delete this line and take the next one. Otherwise, if it matches the second regex, print.)

    The same is also easy -- perhaps even easier -- in Awk.

    awk '/regex1/ && /regex2/' files...