phpdomdocument

How can I prevent html entities with PHP a DOMDocument::saveHTML()?


Due to custom storage needs (the "why" is not important here, thanks!) I have to save html <a> links in a specific format such as this:

myDOMNode->setAttribute("href", "{{{123456}}}");

Everything works fine until i call saveHTML() on the containing DOMDocument. This kills it, since it encodes { in %7B.

This is a legacy application where href="{{{123456}}}" works as a placeholder. The command-line parser look for this pattern exactly (unencoded) and cannot be changed.

I've no choice but to do it this way.

I cannot htmldecode() the result.

This HTML will never be displayed as this, it is just a storage need.

Thanks for your help!

Note: I've looked around for 2 hours but none of the proposed solution worked for me. For those who will blindly mark the question as duplicate: please comment and let me know.


Solution

  • As the legacy code is using {{{...}}} as a placeholder, it may be safe to use a somewhat hackish approach with preg_replace_callback. The following will restore the URL encoded placeholders once the HTML is generated:

    $src = <<<EOS
    <html>
        <body>
            <a href="foo">Bar</a>
       </body>
    </html>
    EOS;
    
    // Create DOM document
    $dom = new DOMDocument();
    $dom->loadHTML($src);
    
    // Alter `href` attribute of anchor
    $a = $dom->getElementsByTagName('a')
        ->item(0)
        ->setAttribute('href', '{{{123456}}}');
    
    // Callback function to URL decode match
    $urldecode = function ($matches) {
        return urldecode($matches[0]);
    };
    
    // Turn DOMDocument into HTML string, then restore/urldecode placeholders 
    $html = preg_replace_callback(
        '/' . urlencode('{{{') . '\d+' . urlencode('}}}') . '/',
        $urldecode,
        $dom->saveHTML()
    );
    
    echo $html, PHP_EOL;
    

    Output (indented for clarity):

    <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
    <html>
        <body>
            <a href="{{{123456}}}">Bar</a>
        </body>
    </html>