javascriptregexregex-groupbackreferencenamed-captures

Regular expression for extracting a noun in variable order


I have the following text:

Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.

where the nouns are Toni Kroos, Real Madrid (team1) and Rayo Vallecano (team2).

I need a regular expression with a named capturing group that returns these results given the following variations:

Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.

Expected result: Rayo Vallecano

Action by Toni Kroos, Real Madrid. Real Madrid 0, Rayo Vallecano 2.

Expected result: Rayo Vallecano

My naive intention was to negate the backreference captured in team1 and use it on the second sentence. So when it is about to match Real Madrid or Rayo Vallecano, it would discard Real Madrid as is the same value as team1. So team2 would return Rayo Vallecano. No lucky so far with something like (it only works on the first example):

^Action by .*\, (?<team1>.*)\. (?!\1)(?<team2>.*)( \d+\,| \d+\.).

In plain English, my expectation is a regex to pick either the first noun or the second one on the second sentence (after the first .) so team2 would be either Real Madrid or Rayo Vallecano in the examples, and then discard the one that matches the named capturing group team1 (Real Madrid in the example). So it wouldn't matter the order of the noun in the second sentence.

I'm no expert with regular expressions, so I'm not sure that's possible to achieve with one unique pattern that fits both examples. Is it possible to get such expression? If so, I would appreciate the solution with an explanation of the pattern used. Thanks in advance.

EDIT: The language I'll be using is JavaScript


Solution

  • You might write the pattern using \1 to refer to the first capture group and use the named group team1 and team2 only once.

    ^Action by [^,]*, (?<team1>[^.]+)[.,] (?:\1[^,]*, )?(?<team2>[^,]+) \d+[,.]
    

    Explanation

    See a regex101 demo.

    const regex = /^Action by [^,]*, (?<team1>[^.]+)[.,] (?:\1[^,]*, )?(?<team2>[^,]+) \d+[,.]/;
    [
      `Action by Toni Kroos, Real Madrid. Rayo Vallecano 2, Real Madrid 0.`,
      `Action by Toni Kroos, Real Madrid. Real Madrid 0, Rayo Vallecano 2.`
    ].forEach(s => {
      const m = s.match(regex);
      if (m) {
        console.log(m.groups);
      }
    });