encodingcharacter-encoding

What's the difference between encoding and charset?


I am confused about the text encoding and charset. For many reasons, I have to learn non-Unicode, non-UTF8 stuff in my upcoming work.

I find the word "charset" in email headers as in "ISO-2022-JP", but there's no such a encoding in text editors. (I looked around the different text editors.)

What's the difference between text encoding and charset? I'd appreciate it if you could show me some use case examples.


Solution

  • Basically:

    1. charset is the set of characters you can use
    2. encoding is a way these characters are stored into memory

    People sometimes use charset to refer both to the character repertoire and the encoding scheme. The Unicode Standard charset has multiple encodings, e.g., UTF-8, UTF-16, UTF-32, UCS-4, UTF-EBCDIC, Punycode, and GB18030.