anchorhtmlpurifier

HTMLPurifier removing <a name="#someanchorname"></a> - how to stop this from happening?


Using htmlpurifier 4.10, anchor name tags are being stripped out of text.

Current config:

    $class_file         = 'static/htmlpurifier-4.10.0-lite/library/HTMLPurifier.auto.php';
    $class_html_cleaner = 'HTMLPurifier';
    require_once($class_file);

    // Initiate config 
    $config     = HTMLPurifier_Config::createDefault();
    $config->set('AutoFormat.AutoParagraph', FALSE);
    $config->set('AutoFormat.RemoveEmpty', TRUE);
    $config->set('AutoFormat.RemoveEmpty.RemoveNbsp', TRUE);

    // initiate class
    $purifier   = new HTMLPurifier($config);

    // clean passed HTML
    $html       = $purifier->purify($html);

Adding the config HTML.Allowed:

    $config->set('AutoFormat.AutoParagraph', FALSE);
    $config->set('AutoFormat.RemoveEmpty', TRUE);
    $config->set('AutoFormat.RemoveEmpty.RemoveNbsp', TRUE);
    $config->set('HTML.Allowed', 'a[href|target|name|id|class]'); 

Does nothing, the name tags are still removed.

Removing three AutoFormat options so I just have this:

    $config->set('HTML.Allowed', 'a[href|target|name|id|class]'); 

Also strips the name attribute, but at least now the name tag I posted is returned as <a></a>.

What else am I missing here? I'd rather not use HTML.Allowed if it means I have to explicitly state every other potential tag/attribute I would ever use.

Guidance/help greatly appreciated. Been fighting with this for an hour now.


Solution

  • The Attr.EnableID rule removes html id attributes by default. (And it looks like name attributes as well.) http://htmlpurifier.org/live/configdoc/plain.html#HTML.EnableAttrID

    Why it happens is explained here, http://htmlpurifier.org/docs/enduser-id.html.