I have the following text:
Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.
where the nouns are Toni Kroos
, Real Madrid
(team1
) and Rayo Vallecano
(team2
).
I need a regular expression with a named capturing group that returns these results given the following variations:
Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.
Expected result: Rayo Vallecano
Action by Toni Kroos, Real Madrid. Real Madrid 0, Rayo Vallecano 2.
Expected result: Rayo Vallecano
My naive intention was to negate the backreference captured in team1
and use it on the second sentence. So when it is about to match Real Madrid
or Rayo Vallecano
, it would discard Real Madrid
as is the same value as team1
. So team2
would return Rayo Vallecano
. No lucky so far with something like (it only works on the first example):
^Action by .*\, (?<team1>.*)\. (?!\1)(?<team2>.*)( \d+\,| \d+\.).
In plain English, my expectation is a regex to pick either the first noun or the second one on the second sentence (after the first .
) so team2
would be either Real Madrid
or Rayo Vallecano
in the examples, and then discard the one that matches the named capturing group team1
(Real Madrid
in the example). So it wouldn't matter the order of the noun in the second sentence.
I'm no expert with regular expressions, so I'm not sure that's possible to achieve with one unique pattern that fits both examples. Is it possible to get such expression? If so, I would appreciate the solution with an explanation of the pattern used. Thanks in advance.
EDIT: The language I'll be using is JavaScript
You might write the pattern using \1
to refer to the first capture group and use the named group team1
and team2
only once.
^Action by [^,]*, (?<team1>[^.]+)[.,] (?:\1[^,]*, )?(?<team2>[^,]+) \d+[,.]
Explanation
^
Start of stringAction by [^,]*,
Match Action by
followed by optional chars other than a comma and then match ,
(?<team1>[^.]+)[.,]
Group team1 match 1+ chars other than .
then then match either .
or ,
(?:\1[^,]*, )?
Optionally match what is matched by group 1 using a backreference followed by optional chars other than ,
followed by matching ,
(?<team2>[^,]+)
Named group team2 match 1+ chars other than ,
and then match a space\d+[,.]
Match 1+ digits followed by ,
or .
See a regex101 demo.
const regex = /^Action by [^,]*, (?<team1>[^.]+)[.,] (?:\1[^,]*, )?(?<team2>[^,]+) \d+[,.]/;
[
`Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.`,
`Action by Toni Kroos, Real Madrid. Real Madrid 0, Rayo Vallecano 2.`
].forEach(s => {
const m = s.match(regex);
if (m) {
console.log(m.groups);
}
});