I would like to grab Text distributed between two lines.
For Example :
PO Number Dept.number
4000813852 7
I would like to get PO Number 4000813852 It's like a table-based data but in the context of the whole document appears to be normal text.
I have used re.MULTILINE
like r'PO Number.*\n[0-9]+'
it workes in this case but it is not the best solution because maybe PO Number comes in the middle as
Invoice Number PO Number Dept.number
123456666 4000813852 7
You can do this with two capture groups and re.DOTALL
option enabled. The expression assumes that the number you are interested is the only one with 10 digits in your text.
The expression is:
(PO\sNumber).*(\d{10})
Python snippet:
import re
first_string = """PO Number Dept.number
4000813852 7"""
second_string = """Invoice Number PO Number Dept.number
123456666 4000813853 7"""
PO_first = re.search(r'(PO\sNumber).*(\d{10})',first_string,re.DOTALL)
print(PO_first.group(1)+" "+PO_first.group(2))
PO_second = re.search(r'(PO\sNumber).*(\d{10})',second_string,re.DOTALL)
print(PO_second.group(1)+" "+PO_second.group(2))
Output:
PO Number 4000813852
PO Number 4000813853