I have a strange issue. I use CKEditor-4 to collect formatted text from user in form of html. Also, the html content is filtered using htmlpurifier from the server.
When the user use quotes like ”
, ’
and “
CKEditor converts them into html entities like ”
, ’
, and “
, which is fine. The issue is, when I filter them using htmlpurifier - this quotes get's automatically decoded. This prevents the content from: being presented to user for later edit as the quotes are literally encoded in strage ways like “
How do i fix this? I think, if I could stop htmlpurifier from automatically decoding things, this would work, But I am new to htmlpurifier - so I can't find a way.
I have tried using htmlentities
before passing it to htmlpurifier. but it would encode the whole html, Hence: stopping htmlpurifier from purifying html at all.
After CBroe's comment, I found out that my application is not using UTF-8 all the way through.
And I can't rectify it also. For those who are in similar situation, I found a work-around. htmlPurifier does support a configuration to encode all non-ASCII charecters with some trade-offs - It's fine with my case(I think).
you can enable the htmlpurifier config Core.EscapeNonASCIICharacters
like so
$config->set('Core.EscapeNonASCIICharacters', true);
which did the trick for me.
This is the full function
/**
* Purifies dirty html
*
* @param string $dirty_html
* @return string
*/
function purifyHtml($dirty_html)
{
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.Encoding', 'UTF-8');
$config->set('Core.EscapeNonASCIICharacters', true);
$config->set('HTML.Doctype', 'HTML 4.01 Transitional');
$config->set('Cache.SerializerPath', getStoragePath('cache/html-purifier'));
$htmlPurifier = new HTMLPurifier($config);
return $htmlPurifier->purify($dirty_html);
}