phphtmlcssutf-8iso-8859-1

Two almost identical html/css pages renders differently


Hi,

I'm upgrading a PHP 5 webpage to PHP 8. As part of the update, I'm also switching the character encoding from charset=iso-8859-1 to charset=utf-8, and have converted the files to UTF-8 without BOM.

However, after the conversion, the converted page becomes significantly longer for some reason. And I can't figure out why. No HTML has been changed (except the charset). But I noticed when looking at the source code using Firefox that Firefox underlines in red both HTML and HEAD when the charset is set to UTF-8 (See images below)

I also had a look at a similar post (Identical HTML/CSS looks differently) and I tried separating the content and the charset but with the same result as before.

Here are the links to the original page and the upgraded page:

Original page: (charset=iso-8859-1): https://www.bokaochplanera.se
Upgraded page (charset=utf-8): https://bokaochplanera-dev.trampolinfilm.se/

Image Source code - Original page: Firefox underlines DOCYTYPE in red
Image Source code - Upgraded page: Firefox underlines DOCYTYPE, HTML and HEAD in red

What am I missing!? Thanks!


Solution

  • After saving the pages and opening them in a hex editor, I can see that no Byte Order Mark (BOM) exists on the original page with content type iso-8859-1.

    However, on your development page with content type set to utf-8, a BOM exists.

    The browser on the development domain is showing as being in quirks mode. While the existence of a BOM itself does not generally cause issues, it is a combination of the existance of the BOM and something else (like specific characters in the page) that will put the browser into quirks mode. You can check if the browser in in quirks mode by opening the developer tools and entering document.compatMode. If it shows BackCompat, it means quirks mode is active. Some browsers, like Firefox, put a message directly in the Console when quirks mode is active:

    This page is in Quirks Mode. Page layout may be impacted. For Standards Mode use “<!DOCTYPE html>”.

    You will want to remove the BOM and make sure both your editor or IDE as well as any tool-chain you are using are not going to re-add the BOM.

    In the end, I find that it is much easier to make sure a BOM does not exist rather than try to deal with the variations that don't trigger quirks mode. Since the BOM characters are not visible by default in most text editors or IDEs, it can be difficult to know if it's there or not or if that combined with something else will trigger quirks mode.

    Lastly, given that some software other than browsers can complain about or not understand BOMs, it is generally better to just make sure it's not there - just make sure you have the right content type set and are not serving characters that do not conform to the selected content type (e.g. if using UTF-8, make sure all the content is UTF-8 and not another encoding).