phpregexpreg-match-all

Regular expression to match part of an optional part of a string


I have a string of words, here broken into lines to better visualize the repeating pattern:

Saint John eats less of those apples.
Saint Paul eats more of those berries.
Saint Luke eats         those oranges.

From this string I need to extract all names, all fruits, and all the quantifiers. The result should be:

Array
(
    [0] => Array
        (
            [0] => Saint John eats less of those apples.
            [1] => John
            [2] => less
            [3] => apples
        )
    [1] => Array
        (
            [0] => Saint Paul eats more of those berries.
            [1] => Paul
            [2] => more
            [3] => berries
        )
    [2] => Array
        (
            [0] => Saint Luke eats those oranges.
            [1] => Luke
            [2] => 
            [3] => oranges
        )
)

I have gotten as far as:

preg_match_all("|Saint (.+?) eats (.+?) of those (.+?).|", $string, $matches);

But this of course doesn't find the last (partial) match. How can I rephrase my regular expression to find it?


Notes

In the real string, there is more non-repeating text before, between, and after the repeating pattern. E.g.:

The apples have worms. That is why Saint John eats less of those apples. Unfortunately Saint John dislikes berries. Unlike Saint Paul. Saint Paul eats more of those berries. When John and Paul are gone, Saint Luke eats those oranges. Afterward, he is still hungry.

Unlike this related question, I don't want to optionally match all of the missing part, but only part of the missing part!


Solution

  • You may use this regex in PHP with a non-capturing optional group:

    ^Saint\h+(\w+)\h+eats(?:\h+(\w+)\h+of)?\h+those\h+(\w+)
    

    RegEx Demo

    RegEx Details:

    PHP Code Demo (Thanks to @sin)

    ---

    Here is another regex solution using branch reset feature supported by PCRE (php, perl etc) or by using regex module in python:

    ^Saint\h+(\w+)\h+eats(?|\h+(\w+)\h+of|())\h+those\h+(\w+)
    

    RegEx Demo 2