phpregexpreg-match-alloverlapping-matches

Issues with regex for overlapping matches


In short, I'm trying to match the longest item furthest right in a string that fits this pattern:

[0-9][0-9\s]*(\.|,)\s*[0-9]\s*[0-9]

Consider, for example, the string "abc 1.5 28.00". I want to match "5 28.00".

Using the pattern "as-is", like so

preg_match_all('/[0-9][0-9\s]*(\.|,)\s*[0-9]\s*[0-9]/', 'abc 1.5 28.00', $result);

we instead get the following matches:

[0] => 1.5 2
[1] => 8.00

No "5 28.00" or "28.00" for that matter, for obvious reasons.

I did some research and people suggested using positive lookahead for problems like this. So I tried the following

preg_match_all('/(?=([0-9][0-9\s]*(\.|,)\s*[0-9]\s*[0-9]))/', 'abc 1.5 28.00', $result);

giving us these matches:

[0] => 1.5 2
[1] => 5 28.00
[2] => 28.00
[3] => 8.00

Now, "5 28.00" is in there which is good, but it can't be reliably identified as the correct match (e.g. you can't just traverse from the end looking for the longest match, because there could be a longer match that appeared earlier in the string). Ideally, I'd want those sub-matches at the end (indexes 2 and 3) to not be there so we can just grab the last index.

Does anyone have ideas for how to accomplish exactly what I need in the simplest/best way possible? Let me know if I need to clarify anything as I know this stuff can get confusing, and many thanks in advance.

**Edit: some additional input/match examples

"abc 1.5 28.00999" => "5 28.00" (i.e. can't match end of string, $)

"abc 500000.05.00" => "5.00"


Solution

  • The nearest match I can get for you is the following

    ((?:\d\s*)+[.,](?:\s*\d){2})(?:(?![.,](?:\s*\d){2}).)*$
    

    And produces the following output (look at '1' in each case)...

    'abc 1.5 28.00999' => array (
      0 => '5 28.00999',
      1 => '5 28.00',
    )
    'abc 500000.05.00' => array (
      0 => '05.00',
      1 => '05.00',
    )
    'abc 111.5 8.0c 6' => array (
      0 => '111.5 8.0c 6',
      1 => '111.5 8',
    )
    'abc 500000.05.0a0' => array (
      0 => '500000.05.0a0',
      1 => '500000.05',
    )
    'abc 1.5 28.00999 6  0 0.6 6' => array (
      0 => '00999 6  0 0.6 6',
      1 => '00999 6  0 0.6 6',
    )