phpdomdocument

Insert a tag after a certain word in DOMDOCUMENT PHP 7.4


I have a tag li that I get using DOMDocument. It has a registered URL, which I need to replace with the a tag with the same URL. The main problem is to keep the position, that is, if the URL is in the middle, then it should be there, how can I do this?

before:
<li><Lorem lorem lorem lorem lorem lorem http://... lorem lorem lorem lorem/li>
after:
<li><Lorem lorem lorem lorem lorem lorem <a href="http://...">http://...</a> lorem lorem lorem lorem/li>

I am thinking about whether it is possible to completely replace the entire tag with

$liElement->parentNode->replaceChild($link, $liElement);

but then the a tag will look like text and not a tag.


Solution

  • Let's assume that the input is the following:

    $html = <<<'HTML'
    <html>
        <ul>
            <li>Item 1 http://example.com/page1</li>
            <li>Item 2 http://example.com/page2</li>
            <li>Item 3 http://example.com/page3</li>
        </ul>
    </html>
    HTML;
    

    To replace the plain text URLs with the a elements you first need to get the DOM nodes. One way to do this is to use the DOMXPath class. Here is an example:

    $doc = new DOMDocument();
    $doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
    $xpath = new DOMXPath($doc);
    $listItems = $xpath->query('//li');
    

    The code above will get all the li elements in the document. Now you can iterate over the child nodes of the list items. If the child node is a text node, build a DOMDocumentFragment with the a element replacing the plain text URL. Here is an example:

    foreach ($listItems as $listItem) {
        foreach ($listItem->childNodes as $childNode) {
            if ($childNode->nodeType === XML_TEXT_NODE) {
                $newText = preg_replace(
                    '/(https?:\/\/[a-z0-9\/.]+)/',
                    '<a href="$1">$1</a>',
                    $childNode->textContent,
                );
    
                $newNode = $doc->createDocumentFragment();
                $newNode->appendXML($newText);
    
                $listItem->replaceChild($newNode, $childNode);
            }
        }
    }
    

    Finally, print the modified HTML:

    echo $doc->saveHTML();
    

    The output for the input above should be the following:

    <html>
        <ul>
            <li>Item 1 <a href="http://example.com/page1">http://example.com/page1</a></li>
            <li>Item 2 <a href="http://example.com/page2">http://example.com/page2</a></li>
            <li>Item 3 <a href="http://example.com/page3">http://example.com/page3</a></li>
        </ul>
    </html>
    

    Note that the replacement of the plain-text URL with an a element in the code above is purely illustrative. You may need to adjust the regular expression and apply additional escaping (e.g., htmlspecialchars).