phpwordpresspreg-split

Preg_split screws up the BR tag?


I am struggling with this one for half a day now and i can't seem to get it right. I have a custom function in my wordpress site, that automatically creates an excerpt. This all goes well, but for some (i guess logical) reason it also cuts off the <br /> tag, since it has a space.

How to fix this? This has to do with the preg_split function right?

Below is my code:

function custom_wp_trim_excerpt($text) {
$raw_excerpt = $text;
if ( '' == $text ) {
    //Retrieve the post content. 
    $text = get_the_content('');

    //Delete all shortcode tags from the content. 
    $text = strip_shortcodes( $text );

    $text = apply_filters('the_content', $text);
    $text = str_replace(']]>', ']]&gt;', $text);

    $allowed_tags = '<p>,<br>,<br/>,<br />,<a>,<em>,<strong>,<img>'; /*** MODIFY THIS. Add the allowed HTML tags separated by a comma.***/
    $text = strip_tags($text, $allowed_tags);

    $excerpt_word_count = 40; /*** MODIFY THIS. change the excerpt word count to any integer you like.***/
    $excerpt_length = apply_filters('excerpt_length', $excerpt_word_count); 

    $excerpt_end = ' <a href="'. get_permalink($post->ID) . '">' . '...' . '</a>'; 
    $excerpt_more = apply_filters('excerpt_more', ' ' . $excerpt_end);

    $words = preg_split("/[\n\r\t ]+/", $text, $excerpt_length + 1, PREG_SPLIT_NO_EMPTY);
    if ( count($words) > $excerpt_length && $words ) {
        array_pop($words);
        $text = implode(' ', $words);
        $text = $text . $excerpt_more;
    } else {
        $text = implode(' ', $words);
    }
}
return apply_filters('wp_trim_excerpt', $text, $raw_excerpt);
}
remove_filter('get_the_excerpt', 'wp_trim_excerpt');
add_filter('get_the_excerpt', 'custom_wp_trim_excerpt');

Thanks!


Solution

  • You can add this to make all of the HTML break characters the same:

    $text = preg_replace('!<br ?/>!i','<br>',$text);
    

    Before these lines:

    $allowed_tags = '<p>,<br>,<a>,<em>,<strong>,<img>'; /*** MODIFY THIS. Add the allowed HTML tags separated by a comma.***/
    $text = strip_tags($text, $allowed_tags);
    

    When you're doing preg_split("/[\n\r\t ]+/",$text) you're splitting the space in the break <br /> character.

    You can also simplify the regex in the preg_split() statement:

    $words = preg_split("!\s+!", $text, $excerpt_length + 1, PREG_SPLIT_NO_EMPTY);
    

    Since you're allowing the other tags they probably contain spaces too though.