apacheutf-8character-encodingiso-8859-1x-sendfile

Apache 2.4 + XSendfile appends charset=utf-8 to content-type header


We just migrated one of our web applications (classic lamp stack) from Ubuntu 14/Apache 2.2/PHP 5.5 to Ubuntu 16/Apache 2.4/PHP 7.0. Everything worked pretty smooth. There is only one part that is giving me headaches at the moment:

One route in our application checks the incoming file request and delivers that file via XSendfile (third party contents). The content-type for the file delivery is set explicit by the application. This means, for example, we append a content-type of text/html for *.html files. This worked really smooth and the browser received that exact content type.

Now, on the new machine (same code and same files to deliver), the content-type received by the browser is text/html;charset=utf-8. There are some charset metadata appended to the content-type.

This is pretty bad for us, because there are some files containing iso-8859-1 encoded contents. The result are encoding errors in the displayed html file.

I have triple checked everything. The content-type we are placing in the HTTP headers are explicitly set without the utf-8 charset metadata.

Right now I am checking various Apache2/XSendfile specific configurations for any hints about that behaviour.

Has anybody else experienced similar behaviour on Apache 2.4 using XSendfile? Is it right, that the content-type sent by Apache has a higher importance than the one in the html meta tag? How can I disable automatic appending of charset meta infos in the content-type header?


Solution

  • I ran into this exact same problem causing havok on an Android app expecting the text/html MIME verbatim, and found out it is actually a PHP config changed in an update. In PHP 5.6, they changed the default default_charset setting from "" to "UTF-8".

    You can either change this in your appropriate php.ini, or simply add a monkey-patch ini_set('default_charset', NULL); before your header() calls.

    default_charset string

    In PHP 5.6 onwards, "UTF-8" is the default value and its value is used as the default character encoding for htmlentities(), html_entity_decode() and htmlspecialchars() if the encoding parameter is omitted...

    All versions of PHP will use this value as the charset within the default Content-Type header sent by PHP if the header isn't overridden by a call to header().

    Ref: http://php.net/manual/en/ini.core.php#ini.default-charset