phpregexbbcode

Matching substrings with PHP preg_match_all()


I'm attempting to create a lightweight BBCode parser without hardcoding regex matches for each element. My way is utilizing preg_replace_callback() to process the match in the function.

My simple yet frustrating way involves using regex to group the elements name and parse different with a switch for each function.

Here is my regex pattern:

'~\[([a-z]+)(?:=(.*))?(?: (.*))?\](.*)(?:\[/\1\])~siU'

And here is the preg_replace_callback() I've got to test.

return preg_replace_callback(
  '~\[([a-z]+)(?:=(.*))?(?: (.*))?\](.*)(?:\[/\1\])~siU', 
  function($matches) {
    var_dump($matches);
    return "<".$matches[1].">".$matches[4]."</".$matches[1].">";
  },
  $this->raw
);

This one issue has stumped me. The regex pattern won't seem to recursively match, meaning if it matches an element, it won't match elements inside it.

Take this BBCode for instance:

[i]This is all italics along with a [b]bold[/b].[/i]

This will only match the [u], and won't match any of the elements inside of it, so it looks like

This is all italics along with a [b]bold[/b].

preg_match_all() continues to show this to be the case, and I've tried messing with greedy syntax and modes.

How can I solve this?


Solution

  • Thanks to @Casimir et Hippolyte for their comment, I was able to solve this using a while loop and the count parameter like they said.

    The basic regex strings don't work because I would like to use values in the tags like [color=red] or [img width=""].

    Here is the finalized code. It isn't perfect but it works.

    $str = $this->raw;
    do {
      $str = preg_replace_callback(
        '~\[([a-z]+)(?:=([^]\s]*))?(?: ([^[]*))?\](.*?)(?:\[/\1\])~si', 
        function($matches) {
          return "<".$matches[1].">".$matches[4]."</".$matches[1].">";
        },
        $str,
        -1,
        $count
      );
    } while ($count);
    return $str;