I'm working on writing a parser to extract information from the output given below
i need to get all the three texts which are in between '--'. so i wrote a regular expression as below
import re
def parse_ib_write_bw(mystr):
output = dict()
# match = re.search('--+\n(\n|.)*?--+', mystr, re.I)
match = re.search('--+\n(.*)--+(.*)--+(.*)--+', mystr, re.DOTALL)
if match:
print(match.groups(1))
print(match.groups(2))
print(match.groups(3))
parse_ib_write_bw(my_str)
My understanding is:
--+\n(.*)--+ --> This would give the output of the first block until second '---' is found
(.*)--+ --> would give the second block until the third '--' is found
(.*)--+ --> would give the third block until the final'--' is found
but i get the entire output. where i'm going wrong with my understanding?
Since .
matches a newline in DOTALL
mode, the first .*
matches all the text between the first and the last line of dashes, while the last line of dashes is matched by the latter --+(.*)--+(.*)--+
where the two .*
s match an empty string.
You can instead use ^
in MULTILINE
mode to assert that each line of dashes begins at the start of a line and is followed by a newline:
re.search('^--+\n(.*)^--+\n(.*)^--+\n(.*)^--+\n', mystr, re.DOTALL | re.MULTILINE)
Demo here