Here is a challenge for regex gurus. Need a very simple sed expression to select text between markers.
Here is an example text. Please mind it can contain any special chars, TABS and white spaces even though this example doesn't depict all possible combinations.
^[[200~a^[[200~aaa aM1bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hhM3kkkkk~
bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hh
ccc[$cM2ddddM2eeeee
I tried this but it select last start of marker to last end of marker
echo "^[[200~a^[[200~aaa aM1bb bbbM1ccc[\$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hhM3kkkkk~"|sed -E "s|.*M1(.*)M3.*$|\1|g"
ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hh
How it is possible? single sed regex expression would be the best. What I mean single regex is one for each above two requirements. i.e. two regex Also need the equivalent python re expression.
The second case is easy, even with sed
:
$ a='^[[200~a^[[200~aaa aM1bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hhM3kkkkk~'
$ sed -E 's/.*M1|M3.*//g' <<< "$a"
ccc[$cM2ddddM2eeeee
The first case is more complex because of the greediness of sed
regexes. If you can use python
or perl
, instead of sed
, you can harness their non-greedy .*?
operator:
$ python -c 'import sys,re; print("\n".join(re.sub(r".*?M1|M3.*?","",l) for l in sys.stdin),end="")' <<< "$a"
bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hh
$ perl -pe 's/.*?M1|M3.*?//g' <<< "$a"
bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hh
A bit shorter with python
if you have only one line of text to process and if we pass it as an argument:
$ python -c 'import sys,re; print(re.sub(r".*?M1|M3.*?","",sys.argv[1]))' "$a"
bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hh
With sed
, one possibility consists in first inserting separator characters that do not appear in the input string, for instance newlines, and then keeping only what appears between them. If your sed
supports \n
for newline in the replacement string of the substitute command:
$ sed -E 's/M1(.*)M3/\n\1\n/;s/.*\n(.*)\n.*/\1/' <<< "$a"
bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hh
Else, with any sed
:
$ sed -E 's/M1(.*)M3/\
\1\
/;s/.*\n(.*)\n.*/\1/' <<< "$a"
bb bbbM1ccc[$cM2ddddM2eeeeeM3ffffff fM3ggggggg M3hhhhh hh
Note: as your shell is bash
, if you absolutely want a one-liner you can use a $'...'
character sequence:
$ sed -E $'s/M1(.*)M3/\\\n\\1\\\n/;s/.*\\n(.*)\\n.*/\\1/' <<< "$a"