phpregexbbcode

Multidimentional BBCODE


I am trying to make myself a BBCODE parser in PHP.

Now I have the following Regex:

\[quote\](.*?)\[\/quote\]

This should replace with:

<div class='quote'><div class='quotetext'>$1</div></div>

This works all perfect until i have a "multidimentional" post Example:

[quote] [quote] [quote] text [/quote] [/quote] [/quote]

This should have the following outcome:

<div class='quote'><div class='quotetext'>
      <div class='quote'><div class='quotetext'>
           <div class='quote'><div class='quotetext'>
           text
           </div></div>
      </div></div>
</div></div>

Right now it gets the following outcome:

<div class='quote'><div class='quotetext'> [quote] [quote] text </div></div> [/quote] [/quote]

Php:

preg_replace("/\[quote\](.*?)\[\/quote\]/", "<div class='quote'><div class='quotetext'>$1</div></div>", $text); 

I hope someone could help me with this issue. Thanks


Solution

  • A regex approach in one pass:

    1. construct an array which associates a bbcode tag with the corresponding html code.
    2. write a pattern able to match nested (or not) quote bbcode tags. The interest will be double, because it will allow to extract only valid parts (that are balanced), to then proceed to the replacement.
    3. proceed to a simple replacement with strtr inside a callback function using the associative array.

    Pro: this is relatively fast since it needs only one pass and because of the use of strtr.
    Cons: It isn't flexible because it will take in account only tags like [quote] and not [quote param="bidule"] or [QUOTE]. (however nothing forbids to write a more elaborated callback function and to change the pattern a little).

    $corr = [
        '[quote]' => '<div class="quote"><div class="quotetext">',
        '[/quote]' => '</div></div>'
    ];
    
    $pat = '~ \[quote]
              # all that is not a quote tag
              (?<content> [^[]*+ (?: \[ (?! /?quote] ) [^[]* )*+ )
              # an eventual recursion ( (?R) is a reference to the whole pattern)
              (?: (?R) (?&content) )*+
              \[/quote]
            ~x';
    
    $result = preg_replace_callback($pat, fn($m) => strtr($m[0], $corr), $str);
    

    A more classical approach with several passes:

    1. Build a pattern that forbids nested quote tags, this way, only inner tags are replaced.
    2. put the replacement in a while loop and stop it when there's no more tags to replace (use the preg_replace count parameter to know that)
    $pat = '~ \[quote] ( [^[]*+ (?: \[ (?! /? quote] ) [^[]* )*+ ) \[/quote] ~x';
    $repl = '<div class="quote"><div class="quotetext">$1</div></div>';
    
    $result = $str;
    $count = 0;
    
    do {
        $result = preg_replace($pat, $repl, $result, -1, $count);
    } while($count); 
    

    pro: more flexible than the first approach since you can easily change the pattern and the replacement string.
    cons: clearly slower since you need n+1 loops where n is the max nesting level.


    As an aside: for what reason you want to replace a poor [quote] tag with two divs when you need only one html tag and when the blockquote tag exists!