I have a piece of text that is repeated several times. Here you have a sample of that text:
The idea is to have a regular expression with three groups and repeat this for any match along with the text. Here you have an example of a possible match:
group1 = HORIZON-CL5-2021-D1-01 group2 (Opening) = 15 Apr 2021 group3 (Deadlines(s)) = 07 Sep 2021 group1 = HORIZON-CL5-2022-D1-01-two-stage group2 (Opening) = 04 Nov 2021 group3 (Deadlines(s)) = 15 Feb 2022 (First Stage), 07 Sep 2022 (Second Stage)
I am trying with this regular expression:
\n(HORIZON-\S+-[A-Z]{1}\d{1}-\d{2}).*?^Opening
It almost works. What I need is to say in the regular expression two more things:
HORIZON-CL5-2022-D1-01 -two-stage
.*?^Opening
but it seems is not correct.How can I solve this?
To get the -two-stage
in group 1, you can add matching 0+ non whitespace chars \S*
to the existing group.
You don't need the s
modifier to make the dot match a newline. Instead, you can match all lines that do not start with Opening using a negative lookahead, and then match Opening and capture the date and the deadline part in a capture group.
Note that you can omit {1}
^(HORIZON-\S+-[A-Z]\d-\d{2}\S*)(?:\r?\n(?!Opening\b).*)*\r?\nOpening: (.+)\r?\nDeadline\(s\): (.+)
You could make the group starting with a date like part as specific as you want, as .+
is a broad match.
For example
^(HORIZON-\S+-[A-Z]\d-\d{2}\S*)(?:\r?\n(?!Opening\b).*)*\r?\nOpening: (\d{2} [A-Z][a-z]{2} \d{4})\r?\nDeadline\(s\): (\d{2} [A-Z][a-z]{2} \d{4}.*)