javaunicodeutf-8utf-16utf

Difference between UTF-8 and UTF-16?


Difference between UTF-8 and UTF-16? Why do we need these?

MessageDigest md = MessageDigest.getInstance("SHA-256");
String text = "This is some text";

md.update(text.getBytes("UTF-8")); // Change this to "UTF-16" if needed
byte[] digest = md.digest();

Solution

  • I believe there are a lot of good articles about this around the Web, but here is a short summary.

    Both UTF-8 and UTF-16 are variable length encodings. However, in UTF-8 a character may occupy a minimum of 8 bits, while in UTF-16 character length starts with 16 bits.

    Main UTF-8 pros:

    Main UTF-8 cons:

    Main UTF-16 pros:

    Main UTF-16 cons:

    In general, UTF-16 is usually better for in-memory representation because BE/LE is irrelevant there (just use native order) and indexing is faster (just don't forget to handle surrogate pairs properly). UTF-8, on the other hand, is extremely good for text files and network protocols because there is no BE/LE issue and null-termination often comes in handy, as well as ASCII-compatibility.