htmlutf-8windows-1252

Validation error for HTML5 document using windows-1252 encoding


The w3.org HTML validator https://validator.w3.org unexpectedly shows an error for a HTML5 document which uses the encoding iso-8859-1.

The message is:

Error Legacy encoding windows-1252 used. Documents must use UTF-8.

I found many resources which only indicate UTF-8 being the recommended encoding. Did I miss something, or is the w3 validator error message questionable?


Example:

<!DOCTYPE html>
<html lang="fr">
  <head>
    <META http-equiv="Content-Type" content="text/html; charset=windows-1252">
    <title>Untitled document</title>
  </head>
  <body>
    <p>
       à la mode 
    </p>
  </body>
</html>

Solution

  • In HTML5, to specify the document's charset, you should use the newer <meta charset="..."> instead of the older <meta http-equiv="Content-Type" content="text/html; charset=..."> (which is still supported for legacy reasons), eg:

    <!DOCTYPE html>
    <html lang="fr">
      <head>
        <meta charset="windows-1252">
        <title>Untitled document</title>
      </head>
      <body>
        <p>
           à la mode 
        </p>
      </body>
    </html>
    

    But, that being said, the current HTML5 standard states:

    4.2.5.4 Specifying the document's character encoding

    A character encoding declaration is a mechanism by which the character encoding used to store or transmit a document is specified.

    The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8.

    As well as:

    Encoding declaration state (http-equiv="content-type")

    The Encoding declaration state is just an alternative form of setting the charset attribute: it is a character encoding declaration. This state's user agent requirements are all handled by the parsing section of the specification.

    For meta elements with an http-equiv attribute in the Encoding declaration state, the content attribute must have a value that is an ASCII case-insensitive match for a string that consists of: the literal string "text/html;", optionally followed by any number of ASCII whitespace, followed by the literal string "charset=utf-8".

    Which means, the validator is correct. You are not allowed to use Windows-1252/ISO-8859-1 (or any other charset) anymore in modern HTML, only UTF-8. It is not just a recommendation, it is a requirement.

    Whether or not particular web browsers choose to enforce this requirement is another matter, though...