phpregexsubstringfilteringbbcode

How to remove all bbcode quote blocks by a specific user from a text?


I'm looking to remove quotes made with BBCode in PHP, like this example:

[quote=testuser]
[quote=anotheruser]a sdasdsa dfv rdfgrgre gzdf vrdg[/quote]
sdfsd fdsf dsf sdf[/quote]
the rest of the post text

I'm looking at doing a blocking system, so users don't have to see content from those they don't want to. So say "testuser" is blocked, they don't want that entire quoted part, including the second quote nested inside as that's part of the main quote.

So the post would be left with only:

the rest of the post text

I'm wondering on the best way to do it this. I was hoping regex, but it's more complicated that I thought, I have this attempt:

/\[quote\=testuser\](.*)\[\/quote\]/is

However, that then captures all end quote tags.

Is there an alternative method that's fast, or a good fix for my regex?

To sum up: Remove the initial quote from the blocked user and everything inside that quote, but nothing else outside it.


Solution

  • This is no simple process as far as I can tell. Here are my steps...

    1. Use preg_split() to divide input string 3 ways: opening quote tags, closing quote tags, and other. I am splitting on the opening and closing tags, but using DELIM_CAPTURE to keep them in the output array and in the original position/order. NO_EMPTY is used so that there are no useless iterations in the foreach loop to follow.
    2. Loop through the generated array and search for the user's name to be omitted.
    3. When a quote by the targeted user is found, store the starting index of that element, and set $open to 1.
    4. Whenever a new opening quote tag is found $open is incremented.
    5. Whenever a new closing quote tag is found $open is decremented.
    6. As soon as $open reaches 0, the $start and end indices are fed to range() to generate an array filled with numbers between the two points.
    7. array_flip(), of course, moves the values to keys.
    8. array_diff_key() removes the range of points from the array generated by preg_split().
    9. If all things go smoothly, implode() will glue the substrings back together retaining only the desired components.

    Function Declaration: (Demo)

    /*
    This function DOES NOT validate the $bbcode string to contain a balanced number of opening & closing tags.
    This funcion DOES check that there are enough closing tags to conclude a targeted opening tag.
    */
    function omit_user_quotes($bbcode, $user) {
        $substrings = preg_split('~(\[/?quote[^]]*\])~', $bbcode, 0, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
        $opens = 0;  // necessary declaration to avoid Notice when no quote tags in $bbcode string
        foreach ($substrings as $index => $substring) {
            if (!isset($start) && $substring == "[quote={$user}]") {  // found targeted user's first opening quote
                $start = $index;  // disqualify the first if statement and start searching for end tag
                $opens = 1;  // $opens counts how many end tags are required to conclude quote block
            } elseif (isset($start)) {
                if (strpos($substring, '[quote=') !== false) {  // if a nested opening quote tag is found
                    ++$opens;  // increment and continue looking for closing quote tags
                } elseif (strpos($substring, '[/quote]') !== false) {  // if a closing quote tag is found
                    --$opens;  // decrement and check for quote tag conclusion or error
                    if (!$opens) {  // if $opens is zero ($opens can never be less than zero)
                        $substrings = array_diff_key($substrings, array_flip(range($start, $index)));  // slice away unwanted elements from input array
                        unset($start);  // re-qualify the first if statement to allow the process to repeat
                    }
                }
            }
        }
        if ($opens) {  // if $opens is positive
            return 'Error due to opening/closing tag imbalance (too few end tags)';
        } else {
            return trim(implode($substrings));  // trims the whitespaces on either side of $bbcode string as feature
        }    
    }
    

    Test Input:

    /* Single unwanted quote with nested innocent quote: */
    /*$bbcode='[quote=testuser]
    [quote=anotheruser]a sdasdsa dfv rdfgrgre gzdf vrdg[/quote]
    sdfsd fdsf dsf sdf[/quote]
    the rest of the test'; */
    /* output: the rest of the test */
    
    /* Complex battery of unwanted, wanted, and nested quotes: */
    $bbcode = '[quote=mickmackusa]Keep this[/quote]
    [quote=testuser]Don\'t keep this because 
        [quote=mickmackusa]said don\'t do it[/quote]
        ... like that\'s a good reason
        [quote=NaughtySquid] It\'s tricky business, no?[/quote]
        [quote=nester][quote=nesty][quote=nested][/quote][/quote][/quote]
    [/quote]
    Let\'s remove a second set of quotes
    [quote=testuser]Another quote block[/quote]
    [quote=mickmackusa]Let\'s do a third quote inside of my quote...
    [quote=testuser]Another quote block[/quote]
    [/quote]
    This should be good, but
    What if [quote=testuser]quotes himself [quote=testuser] inside of his own[/quote] quote[/quote]?';
    
    /* No quotes: */
    //$bbcode='This has no bbcode quote tags in it.';
    /* output: This has no bbcode quote tags in it. */
    
    /* Too few end quote tags by innocent user:
    (No flag is raised because the targeted user has not quoted any text) */
    //$bbcode='This [quote=mickmackusa] has not end tag.';
    /* output: This [quote=mickmackusa] has not end tag. */
    
    /* Too few end quote tags by unwanted user: */
    //$bbcode='This [quote=testuser] has not end tag.';
    /* output: Error due to opening/closing tag imbalance (too few end tags) */
    
    /* Too many end quote tags by unwanted user: 
    (No flag is raised because the function does not validate the bbcode text as fully balanced) */
    //$bbcode='This [quote=testuser] has too many end[/quote] tags.[/quote]';
    /* output: This  tags.[/quote] */
    

    Function Call:

    $user = 'testuser';
    
    echo omit_user_quotes($bbcode, $user);  // omit a single user's quote blocks
    
    /* Or if you want to omit quote blocks from multiple users, you can use a loop:
    $users = ['mickmackusa', 'NaughtySquid'];
    foreach ($users as $user) {
        $bbcode = omit_user_quotes($bbcode, $user);
    }
    echo $bbcode;
    */