phputf-8html-entitiesmb-convert-encoding

Convert html entities to UTF-8, but keep existing UTF-8


I want to convert html entities to UTF-8, but mb_convert_encoding destroys already UTF-8 encoded characters. Whats the correct way?

$text = "äöü ä ö ü ß";
var_dump(mb_convert_encoding($text, 'UTF-8', 'HTML-ENTITIES'));
// string(24) "äöü ä ö ü ß"

Solution

  • mb_convert_encoding() isn't the correct function for what you're trying to achieve: you should really be using html_entity_decode() instead, because it will only convert the actual html entities to UTF-8, and won't affect the existing UTF-8 characters in the string.

    $text = "äöü ä ö ü ß";
    var_dump(html_entity_decode($text, ENT_COMPAT | ENT_HTML401, 'UTF-8'));
    

    which gives

    string(18) "äöü ä ö ü ß"
    

    Demo