phphtmlhtmlpurifier

Why dont emojis and special characters display properly after html purification with PHP?


So I have a blog that allows users to write content in one location using TinyMCE. The content is then sanitized, validated, and stored in the database. It is displayed on a different page. Pretty standard. However, I am allowing users to use emojis and special characters. Before I store the body content I run it through HTMLPurifier and then I run it through HTMLPurifier a second time before display.

This is the code for purifying the body content before the database

//Purify the body HTML
$purifier = new HTMLPurifier();
$news_data['body']['clean'] = $purifier->purify($news_data['body']['raw']);

The emojis appear as normal in the database but even outputting the raw data to the webpage, I get the same weird data (see screenshot)

This is the display code:

$purifier = new HTMLPurifier($config);
$purified_body = $purifier->purify($row['body']);
$clean_body = html_entity_decode($purified_body, ENT_QUOTES, 'UTF-8');

My output: enter image description here

So, why aren't my emojis and special characters displaying properly if they appear raw in the database correctly?

Edit: I checked and the database collation is set to utf8mb4_unicode_ci

Thanks!


Solution

  • The Content-Type header was set to charset=ISO-8859-1 instead of UTF-8.

    header("Content-Type: text/html; charset=utf-8");
    

    This resolved the issue.