phphtmlsubstrstrip-tags

Using PHP substr() and strip_tags() while retaining formatting and without breaking HTML


I have various HTML strings to cut to 100 characters (of the stripped content, not the original) without stripping tags and without breaking HTML.

Original HTML string (288 characters):

$content = "<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div over <div class='nestedDivClass'>there</div>
</div> and a lot of other nested <strong><em>texts</em> and tags in the air
<span>everywhere</span>, it's a HTML taggy kind of day.</strong></div>";

Standard trim: Trim to 100 characters and HTML breaks, stripped content comes to ~40 characters:

$content = substr($content, 0, 100)."..."; /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div ove... */

Stripped HTML: Outputs correct character count but obviously looses formatting:

$content = substr(strip_tags($content)), 0, 100)."..."; /* output:
With a span over here and a nested div over there and a lot of other nested
texts and tags in the ai... */

Partial solution: using HTML Tidy or purifier to close off tags outputs clean HTML but 100 characters of HTML not displayed content.

$content = substr($content, 0, 100)."...";
$tidy = new tidy; $tidy->parseString($content); $tidy->cleanRepair(); /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div ove</div></div>... */

Challenge: To output clean HTML and n characters (excluding character count of HTML elements):

$content = cutHTML($content, 100); /* output:
<div>With a <span class='spanClass'>span over here</span> and a
<div class='divClass'>nested div over <div class='nestedDivClass'>there</div>
</div> and a lot of other nested <strong><em>texts</em> and tags in the
ai</strong></div>...";

Similar Questions


Solution

  • Not amazing, but works.

    function html_cut($text, $max_length)
    {
        $tags   = array();
        $result = "";
    
        $is_open   = false;
        $grab_open = false;
        $is_close  = false;
        $in_double_quotes = false;
        $in_single_quotes = false;
        $tag = "";
    
        $i = 0;
        $stripped = 0;
    
        $stripped_text = strip_tags($text);
    
        while ($i < strlen($text) && $stripped < strlen($stripped_text) && $stripped < $max_length)
        {
            $symbol  = $text{$i};
            $result .= $symbol;
    
            switch ($symbol)
            {
               case '<':
                    $is_open   = true;
                    $grab_open = true;
                    break;
    
               case '"':
                   if ($in_double_quotes)
                       $in_double_quotes = false;
                   else
                       $in_double_quotes = true;
    
                break;
    
                case "'":
                  if ($in_single_quotes)
                      $in_single_quotes = false;
                  else
                      $in_single_quotes = true;
    
                break;
    
                case '/':
                    if ($is_open && !$in_double_quotes && !$in_single_quotes)
                    {
                        $is_close  = true;
                        $is_open   = false;
                        $grab_open = false;
                    }
    
                    break;
    
                case ' ':
                    if ($is_open)
                        $grab_open = false;
                    else
                        $stripped++;
    
                    break;
    
                case '>':
                    if ($is_open)
                    {
                        $is_open   = false;
                        $grab_open = false;
                        array_push($tags, $tag);
                        $tag = "";
                    }
                    else if ($is_close)
                    {
                        $is_close = false;
                        array_pop($tags);
                        $tag = "";
                    }
    
                    break;
    
                default:
                    if ($grab_open || $is_close)
                        $tag .= $symbol;
    
                    if (!$is_open && !$is_close)
                        $stripped++;
            }
    
            $i++;
        }
    
        while ($tags)
            $result .= "</".array_pop($tags).">";
    
        return $result;
    }
    

    Usage example:

    $content = html_cut($content, 100);