phpregexmarkdown

Trying to imitate Markdown's blockquote functionality in a PHP WYSIWYG


I'm building a WYSIWYG from the ground up as an academic exercise and I've run into a snag. I'm trying to build the WYSIWYG to work using some basic Markdown methods, like using asterisks for bold/italic, haskmarks for headings, etc. However, I've run into an issue with blockquotes. Here's the code I'm currently using to process the input for blockquotes:

$content = '$_POST['content'];
while (preg_match('^>\s(.*)$', $content)) {
    $content    =   preg_replace('^>\s(.*)$', '<blockquote>$1</blockquote>', $content);
};

Basically it looks for any line that starts with a 'greater than' sign, extracts the text and places it in blockquote tags like so:

input:
> this is a blockquote.

output:
<blockquote>this is a blockquote.</blockquote>

That's great, but Markdown can also take a multiline blockquote and turn it into a single blockquote. For example:

input:
> this is a blockquote that
> i decided to separate across
> several lines.

output:
<blockquote>this is a blockquote that i decided to separate across several lines.</blockquote>

I'd like to mimic that functionality but with my current code I'll end up with this:

output:
<blockquote>this is a blockquote that</blockquote><blockquote>i decided to separate across</blockquote><blockquote>several lines.</blockquote>

I'm just not sure how to properly concatenate the blockquotes. One approach I thought of was changing each line, then doing a new search for </blockquote><blockquote> without a double line break between them but that seems inefficient. The code would become:

$content = '$_POST['content'];
while (preg_match('^>\s(.*)$', $content)) {
    $matched = true;
    $content = preg_replace('^>\s(.*)$', '<blockquote>$1</blockquote>', $content);
};
if ($matched) {
    $content = preg_replace('</blockquote>(\n|\r)?<blockquote>', '', $content);
};

I suppose that would work, but I feel like there's a better method that utilizes the regex to lookahead and grab all of the extra lines. Unfortunately, I'm clueless as to what that would be.


Solution

  • Replace <p> with any tag that you want :)

    <?php
    
    $text = '
    line one
    line 2
    > blockquote blockquote blockquote blockquote blockquote
    > blockquote blockquote blockquote blockquote blockquote
    > blockquote blockquote blockquote.
    line 3
    
    > AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    > AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    > AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    
    any
    ';
    function blockquota($matches){
     $mytext = str_replace(array("> ", ">"), '' , $matches[0]);
     $mytext = '<p>'.$mytext.'</p>';
     return $mytext;
    }
    $pattern = '/(>)([^\v]+)(\v*((>)([^\v]+)))*/';
    echo htmlspecialchars(preg_replace_callback($pattern, 'blockquota', $text));
    ?>
    

    OUTPUT:

    line one
    line 2
    <p>blockquote blockquote blockquote blockquote blockquote
    blockquote blockquote blockquote blockquote blockquote
    blockquote blockquote blockquote.</p>
    line 3
    
    <p>AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
    AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</p>
    
    any