phpfilterhtmlpurifier

HtmlPurifier - allow data attribute


I'm trying to allow some data-attribute with htmlPurifier for all my span but no way...

I have this string:

<p>
    <span data-time-start="1" data-time-end="5" id="5">
       <word class="word">My</word>
       <word class="word">Name</word>
    </span>
    <span data-time-start="6" data-time-end="15" id="88">
       <word class="word">Is</word>
       <word class="word">Zooboo</word>
    </span>
<p>

My htmlpurifier config:

$this->HTMLpurifierConfigInverseTransform = \HTMLPurifier_Config::createDefault();
$this->HTMLpurifierConfigInverseTransform->set('HTML.Allowed', 'span,u,strong,em');
$this->HTMLpurifierConfigInverseTransform->set('HTML.ForbiddenElements', 'word,p');
$this->HTMLpurifierConfigInverseTransform->set('CSS.AllowedProperties', 'font-weight, font-style, text-decoration');
$this->HTMLpurifierConfigInverseTransform->set('AutoFormat.RemoveEmpty', true);

I purify my $value like this:

$purifier = new \HTMLPurifier($this->HTMLpurifierConfigInverseTransform);
var_dump($purifier->purify($value));die;

And get this :

<span>My Name</span><span>Is Zoobo</span>

But how to conserve my data attributes id, data-time-start, data-time-end in my span ?

I need to have this :

<span data-time-start="1" data-time-end="5" id="5">My Name</span data-time-start="6" data-time-end="15" id="88"><span>Is Zoobo</span>

I tried to test with this config:

$this->HTMLpurifierConfigInverseTransform->set('HTML.Allowed', 'span[data-time-start],u,strong,em');

but error message :

User Warning: Attribute 'data-time-start' in element 'span' not supported (for information on implementing this, see the support forums)

Thanks for your help !!

EDIT 1

I tried to allow ID in the firdt time with this code line:

$this->HTMLpurifierConfigInverseTransform->set('Attr.EnableID', true);

It doesn't work for me ...

EDIT 2

For data-* attributes, I add this line but nothing happened too...

$def = $this->HTMLpurifierConfigInverseTransform->getHTMLDefinition(true);
$def->addAttribute('sub', 'data-time-start', 'CDATA');
$def->addAttribute('sub', 'data-time-end', 'CDATA');

Solution

  • HTML Purifier is aware of the structure of HTML and uses this knowledge as basis of its white-listing process. If you add a standard attribute to a whitelist, it doesn't allow arbitrary content for that attribute - it understands the attribute and will still reject content that makes no sense.

    For example, if you had an attribute somewhere that took numeric values, HTML Purifier would still deny HTML that tried to enter the value 'foo' for that attribute.

    If you add custom attributes, just adding it to the whitelist does not teach HTML Purifier how to handle the attributes: What data can it expect in those attributes? What data is malicious?

    There's extensive documentation how you can tell HTML Purifier about the structure of your custom attributes here: Customize

    There's a code example for the 'target' attribute of the <a>-tag:

    $config = HTMLPurifier_Config::createDefault();
    $config->set('HTML.DefinitionID', 'enduser-customize.html tutorial');
    $config->set('HTML.DefinitionRev', 1);
    $config->set('Cache.DefinitionImpl', null); // remove this later!
    $def = $config->getHTMLDefinition(true);
    $def->addAttribute('a', 'target', 'Enum#_blank,_self,_target,_top');
    

    That would add target as a field that accepts only the values "_blank", "_self", "_target" and "_top". That's a bit stricter than the actual HTML definition, but for most purposes entirely sufficient.

    That's the general approach you will need to take for data-time-start and data-time-end. For possible configuration, check out the official HTML Purifier documentation (as linked above). My best guess from your example is that you don't want Enum#... but Number, like this...

    $def->addAttribute('span', 'data-time-start', 'Number');
    $def->addAttribute('span', 'data-time-end', 'Number');
    

    ...but check it out and see what suits your use-case best. (While you're implementing this, don't forget you also need to list the attributes in the whitelist as you're currently doing.)

    For id, you should include Attr.EnableID = true as part of your configuration.

    I hope that helps!