I have the following problem. My file looks like this:
1082016051300000010005690902BCDEΔ0204366221002201612052016-00001274448A
1082016051300000010034397704EDFG10865125102001201626042016-000001028134
1082016051300000010068901401RADJ34835974123112201528042016-000001343290
1082016051300000010068901401RADJ34835974103112201528042016-000000910290
1082016051300000010095474301RADJ13453401102812201525042016-00000208995K
1082016051300000010098429002RADJΤ1052947211312201218042016-000034021290
1032016051300000010095474301RADJ13453401102812201525042016-00000208995K
1032016051300000010098429002RADJΤ1052947211312201218042016-000034021290
and I'm trying to print only lines that match both patterns, anywhere on the line. I want to print only lines that match two patterns, the first pattern being on columns 2:1 (08) and the second pattern being the word (RAD). I've tried to do this with grep:
grep -o '.[0-1][1-8]*RAD' FILEIN
and the only response I get is that FILEIN is a binary file. I've also tried with sed
this:
sed -n '/[0-1][1-8]*RAD/p' FILEIN
but I have a feeling the *
is not expanded. I've managed to make it work by looking for two patterns in succession, like:
sed -n '/RAD/p' FILEIN | sed '/^108/p'
and this works, but the file I'll be using as input is potentially huge, and I'm not sure that piping a stream into another is time efficient. Could someone help me? Awk or Perl are welcome too.
You can add an -a
option to grep
to force it to read a file as text.
If the order of the patterns is fixed, the correct regex to allow zero or more characters between them is .*
, not just *
.
grep -a '[0-1][1-8].*RAD' files...
sed
is a scripting language; you can combine multiple conditions and actions easily.
sed -n '/regex1/!d;/regex2/p' files...
(If no match on the first regex, delete this line and take the next one. Otherwise, if it matches the second regex, print.)
The same is also easy -- perhaps even easier -- in Awk.
awk '/regex1/ && /regex2/' files...