phpmysqlinternationalizationphpmyadmin

Why does everyone use latin1?


Someone just said utf8 has variable length encoding from 1 to 3 bytes.

So why does everyone still use latin1? If the same thing is stored in utf8 it is also 1 byte, but utf8 has the advantage that it can adapt to a larger character set.


Solution

  • ISO 8859-1 is the (at least de facto) default character encoding of multiple standards like HTTP (at least for textual contents):

    When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value.

    The reason that ISO 8859-1 was chosen is probably as it’s a superset of US-ASCII that is the fundamental character set for internet based technologies. And as the World Wide Web was invented and developed at CERN in Geneva, Switzerland, that might be the reason to choose characters of Western European languages for the 128 remaining characters.

    When the Unicode standard was developed, the character set of ISO 8859-1 was used for the base of the Unicode character set (the Universal Character Set) so that the first 256 character are identical to those of ISO 8859-1. This was probably done due to the importance of ISO 8859-1 for the Web as it already was the standard character encoding for many technologies.

    Now to discuss the advantages of ISO 8859-1 in opposite to UTF-8, we need to look at the underlying character sets and the encoding schemes that are used to encode these characters:

    So the difference if the range of codeable characters on the one hand and the length of the encoded word on the other hand.

    So the choice of the “right” character encoding depends on the needs: If you only need the characters of ISO 8859-1 (or US-ASCII as a subset of it), use ISO 8859-1 as it only requires one byte for each character in opposite to UTF-8 where the characters 128–255 require two bytes. And if you need more or other characters than those in ISO 8859-1, use UTF-8.